基于自注意力卷积网络的遥感图像分类

李彦甫; 范习健; 杨绪兵; 徐新洲

doi:10.12171/j.1000-1522.20210196

基于自注意力卷积网络的遥感图像分类

Remote sensing image classification framework based on self-attention convolutional neural network

摘要

摘要:
目的遥感图像分类技术在森林资源调查、生态工程规划以及森林病虫害防控等林业监测业务中，扮演着至关重要的角色。通过引入自注意力模块增强卷积网络对遥感图像的特征刻画能力，以期提高遥感图像的分类效果。
方法该文提出了一种融合自注意力机制和残差卷积网络的遥感图像分类方法，首先利用卷积神经网络提取丰富的深度纹理语义特征，然后在卷积网络的最后3个瓶颈层嵌入多头自注意力模块，挖掘遥感图像复杂的全局结构信息。嵌入自注意力模块的卷积分类网络，能够有效提升遥感图像的分类精确度。该研究使用RSSCN7、EuroSAT与PatternNet 3个公开的遥感图像数据集，基于Pytorch深度学习库训练与测试该方法，并增加与已有分类框架算法精度和性能的对比试验。同时，使用不同批次、不同数量大小的数据训练改进研究提出的方法，并测试分类效果。
结果试验得出，该研究提出的方法在3个遥感分类数据集上的平均识别率分别达到了91.30%、97.88%和97.37%，其中在前两个数据集上较现有的基于深度卷积网络的算法分别提升了2.26%和3.73%。同时，该算法的总参数量为2.08 × 10⁷，较现有参数量最低的方法减少了5.2 × 10⁶。
结论相比已有的遥感图像分类框架，该研究提出的方法能够在图形处理器（GPU）加速的环境中，取得更为准确的分类效果。同时有效减少了模型的参数量，提高了算法执行的效率，便于后续的实际应用部署。

Abstract:
Objective Remote sensing image classification technology plays a vital role in forestry monitoring operations such as forest resource survey, ecological engineering planning and forest pest and disease control.
Method The work proposes a remote sensing image classification method based on multi-headed self-attentive modules, which uses the convolutional neural network framework ResNet50 as the backbone network of the whole framework. The intermediate layers of the last three bottleneck layers of the ResNet50 network were replaced with multi-headed self-attentive modules, which enable the model to focus on the regions with the highest discrimination and thus improve the classification accuracy. The experiments in this study used three publicly available datasets, RSSCN7, EuroSAT and PatternNet, based on the Pytorch machine learning library, to train and test the framework, and add a comparison experiment with the accuracy of existing classification frameworks. At the same time, different batch sizes were used to train the proposed framework and test the classification effect.
Result Experimental results showed the average recognition rate of the proposed method on the three remote sensing classification datasets reaching 91.30%, 97.88% and 97.37%, respectively, which was better than the existing algorithms based on deep convolutional networks. Also, the total number of parameters of this algorithm was 2.08 × 10⁷, which was also much lower than that of existing algorithms.
Conclusion The results show that the proposed framework is able to achieve higher accuracy in a GPU-accelerated environment, reduce the number of parameters included in the framework, reduce the video memory consumption, and improve the accuracy of the classification results compared with existing remote sensing image classification frameworks.

HTML全文

参考文献(19)

施引文献

资源附件(0)