高级检索

    掩膜分辨率自适应与边缘协同优化的野生动物实例分割

    Adaptive mask resolution with collaborative edge optimization for wildlife instance segmentation

    • 摘要:
      目的 野生动物实例分割是获取高质量动物行为追踪、栖息地利用分析和种群动态统计等生态参数的基础。现有模型普遍采用编码器-解码器架构,通过多级下采样提取特征后固定上采样至原始分辨率生成掩膜,这种标准流程不仅因下采样与固定感受野卷积破坏目标边界结构连续性,导致个体轮廓模糊破碎,更未能依据像素所处遮挡复杂度、边缘锐度等上下文实施难度感知的资源分配,严重制约了行为学、种群动态学等高精度生态分析的数据信度。为应对上述挑战,本研究构建一个包含4 231张图像的综合性野生动物数据集,并提出融合动态分辨率与边缘增强的实例分割框架,旨在通过难度自适应的资源调度机制实现野外复杂场景下动物个体的精准轮廓提取。
      方法 该框架通过通道注意力深度自适应模块(SE-DAM)集成可变形卷积,精准捕捉细粒度空间细节以强化特征定位能力;并采用自适应分辨率模块(ARM),基于实例尺度与遮挡特性动态调整掩膜分辨率,显著缓解下采样所致的空间信息损失;同时引入概率驱动协同优化分割模块(PDCO),利用离散余弦变换(DCT)进行像素难度分类,并建模难易区域间的相互依赖关系,通过优先处理简单像素进而指导复杂区域细化,避免了冗余的前景/背景计算。3个模块在特征提取、分辨率决策和掩膜细化环节形成闭环优化,共同提升对遮挡、伪装及小目标的分割鲁棒性。本方法在实验数据集上通过mAP、mI2oU、FPS等多指标定量评估、主流基线模型性能对比及消融实验验证了其分割精度与效率优势。
      结果 基于ResNet-101-FPN骨干网络,本方法mAP达53.7%,mI2oU达86.71%,较Mask R-CNN基线分别提升了11.2和7.7个百分点,浮点运算量仅提升0.3 GFLOPs,帧率稳定在12.4 帧/s,验证了算法在精度与效率间的有效平衡。当采用Swin-B骨干网络时,mAP进一步达到54.8%,明显优于Mask Transfiner(50.8%)与QueryInst(52.4%)等现有最优方法,展现出良好的骨干网络泛化能力。此外,动态分辨率策略相较于采用112 × 112像素固定分辨率掩膜的方法,浮点运算量为1.31 GFLOPs,显著低于对比方法,且在相似个体堆叠、枝叶遮挡及红外图像等复杂场景中均保持优异的分割性能,证实了方法对野外监测环境的强适应性。
      结论 本文提出的融合动态掩模分辨率与边缘细节增强的野生动物实例分割方法,通过难度感知的自适应资源分配机制有效解决了传统方法中固定分辨率导致的边界模糊与计算冗余问题,在遮挡、伪装和小目标等挑战性场景下的识别精度与轮廓质量显著提升,为高效、高精度的野生动物自动监测与生态研究提供了可直接部署的技术方案,对推动智慧生态保护具有明确的应用价值。

       

      Abstract:
      Objective Wildlife instance segmentation underpins critical ecological analyses, including behavior tracking, habitat utilization assessment, and population dynamics monitoring. Existing encoder-decoder architectures extract features via multi-level downsampling and then apply fixed upsampling to generate masks. This pipeline degrades boundary continuity, yielding blurry and fragmented contours, and lacks difficulty-aware resource allocation based on pixel-wise occlusion complexity and edge sharpness. These limitations severely compromise data fidelity for high-precision ecological research.
      Method To address these challenges, we construct a comprehensive wildlife dataset comprising 4 231 images and propose a novel instance segmentation framework that fuses dynamic resolution with edge enhancement. Our approach enables accurate animal contour extraction through difficulty-adaptive resource scheduling. The framework integrates deformable convolutions within a Channel Attention Deep Adaptive Module (SE-DAM) to capture fine-grained spatial details. An Adaptive Resolution Module (ARM) dynamically adjusts mask resolution based on instance scale and occlusion characteristics, mitigating spatial information loss. Additionally, a Probability-Driven Collaborative Optimization (PDCO) module employs Discrete Cosine Transform (DCT) for pixel difficulty classification and models interdependencies between easy and hard regions. By prioritizing simple pixels to guide the refinement of complex regions, PDCO avoids redundant foreground/background computations. These three modules form a closed-loop optimization across feature extraction, resolution decision, and mask refinement, collectively enhancing robustness to occlusion, camouflage, and small targets.
      Result Experimental results demonstrate that with a ResNet-101-FPN backbone, our method achieves an mAP of 53.7% and an mI2oU of 86.71%, surpassing Mask R-CNN by 11.2 and 7.7 percentage points, respectively. The computational cost increases by only 0.3 GFLOPs with an inference speed of 12.4 FPS, showing an effective balance between accuracy and efficiency. Using a Swin-B backbone, the mAP reaches 54.8%, outperforming state-of-the-art methods such as Mask Transfiner (50.8%) and QueryInst (52.4%). Our dynamic resolution strategy maintains superior performance in challenging scenarios, including overlapping individuals, branch occlusion, and infrared imagery, while achieving significantly lower computational costs than fixed-resolution approaches.
      Conclusion We present a wildlife instance segmentation method that integrates dynamic mask resolution with edge detail enhancement. Through a difficulty-aware adaptive resource allocation mechanism, our approach effectively mitigates boundary blur and computational redundancy inherent in conventional fixed-resolution methods. This yields significant improvements in recognition accuracy and contour quality under challenging conditions including occlusion, camouflage, and small targets. The framework provides a deployable solution for efficient, high-precision automatic wildlife monitoring and ecological research, offering clear practical value for intelligent ecological conservation.

       

    /

    返回文章
    返回