基于YOLOv7的红外相机野生动物图像筛选

齐建东; 马鐘添; 郑尚姿

doi:10.12171/j.1000-1522.20230112

摘要:

目的野外环境通常植被繁茂、树木杂乱，且受环境、天气、光照等因素影响，红外相机易误触发拍摄，从而捕获大量废片，需要耗费大量人力进行筛选。为解决此类问题，本研究以YOLOv7模型为基础，对其进行轻量化改进，以实现对废片的自动筛选。

方法本研究构建了北京密云雾灵山自然保护区2014—2015年期间采集到的2 172张野生动物图像数据集，并对图像中出现的动物进行位置标记。对YOLOv7网络使用不同方式进行改进：引入MicroBlock替换YOLOv7的主干网络，使用轻量化SPPCSPC结构降低模型参数量。采用SIoU损失、LNDown下采样、BiFPN提升模型检测动物的能力。使用YOLOv5-m、YOLOv5-l、Ghost-YOLOv5-l、YOLOv6、YOLOX-M、YOLOR-CSP模型，在含有1万张图像的Snapshot Serengeti相机陷阱图像子数据集上进行训练和验证，对比本文模型对野生动物图像的筛选效果。利用迁移学习训练自建野生动物数据集，测试冻结不同层数的训练效果。

结果基于YOLOv7的改进模型推理时间降低了14.3%，每秒浮点运算次数FLOPS降低了33.5%，参数量减少了17.8%，误检测方面也优于YOLOv7模型。与其他模型进行对比，改进后的YOLOv7虽未在所有指标中均达到最优，但在检测时间与精度上达到了更好的平衡。在自建数据集中使用未冻结权重方式微调效果最优，平均精度比未使用迁移学习模型提高了12.6%。

结论本研究为密云地区野生动物监测网络提供了更快速、准确的筛选方案。

Abstract:

Objective Due to the lush vegetation and disorderly trees in the wild environment, as well as the influence of factors such as environment, weather, and lighting, infrared cameras are prone to triggering shooting errors, resulting in the capture of a large amount of waste film, which requires a lot of manpower for screening. To solve such problems, based on the YOLOv7 model, this paper has made lightweight improvements to achieve automatic screening of waste pieces.

Method This study constructed a dataset of 2 172 wildlife images collected from the Beijing Wuling Mountain Nature Reserve in the period of 2014−2015, and marked the positions of animals in the images. YOLOv7 network was improved in different ways. MicroBlock was introduced to replace the backbone network of YOLOv7, and the SPPCSPC structure was light-weighted to reduce the model parameters. SIoU loss, LNDown downsampling, and BiFPN were used to improve the model’s ability to detect animals. YOLOv5-m, YOLOv5-l, Ghost-YOLOv5-l, YOLOv6, YOLOX-M, and YOLOR-CSP models were trained on an Snapshot Serengeti camera trap subset dataset containing 10 000 images, and the screening effects of the model on wildlife images were compared. Transfer learning was used to train a self-built wildlife dataset, and the training effects of freezing different layers was tested.

Result The improved model based on YOLOv7 reduced inference time by 14.3%, floating-point operations per second by 33.5%, and parameters by 17.8% compared with the YOLOv7 network. The error detection of the improved YOLOv7 model was also better than that of YOLOv7. Although the improved YOLOv7 did not achieve the best performance in all indicators compared with other models, it achieved a better balance between detection time and accuracy. In the self-built dataset, the unfrozen weight method had the best effect, and average precision was 12.6% higher than that of the model without transfer learning.

Conclusion This study provides a faster and more accurate screening solution for wildlife monitoring networks in the Mountain area of Beijing Miyun.

基于YOLOv7的红外相机野生动物图像筛选

Wildlife image screening for infrared cameras based on YOLOv7