Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (11): 2215-2223    DOI: 10.3785/j.issn.1008-973X.2022.11.012
    
Remote sensing image target detection combining multi-scale and attention mechanism
Yun-zuo ZHANG1,2(),Wei GUO1,Zhao-quan CAI3,Wen-bo LI1
1. School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang 050043, China
2. Hebei Key Laboratory of Electromagnetic Environmental Effects and Information Processing, Shijiazhuang Tiedao University, Shijiazhuang 050043, China
3. Shanwei Institute of Technology, Shanwei 516600, China
Download: HTML     PDF(2731KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Remote sensing images have deficiencies such as complex backgrounds, significant differences in target scales, and dense distribution, resulting in poor detection of existing algorithms. A remote sensing image object detection algorithm that combined multi-scale and attention mechanisms was proposed. The receptive field of images of different sizes improved the atrous spatial pyramid pooling module. An attention module was proposed to improve the feature extraction ability for target regions of remote sensing images under complex backgrounds by learning the feature map channel information and the spatial location information. A weighted bidirectional feature pyramid network structure was introduced to combine with the backbone network to improve the fusion of multi-level features. A distance-based non-maximum suppression method was used for postprocessing, which improved the problem of easy overlapping of detection frames. Experimental results on DIOR and NWPU VHR-10 datasets showed that the mean average precision (mAP) of the proposed algorithm reached 71.6% and 91.6%, which were 2.9% and 1.5% higher than those of the mainstream YOLOv5s algorithm respectively. The algorithm achieved good detection results for complex remote sensing images.



Key wordsremote sensing image      target detection      YOLOv5s algorithm      multi-scale feature      attention module      feature fusion      non-maximum suppression     
Received: 30 November 2021      Published: 02 December 2022
CLC:  TP 751.1  
Fund:  广东省重点领域研发计划资助项目(2019B010137002);国家自然科学基金资助项目(61702347, 62027801);河北省自然科学基金资助项目(F2022210007, F2017210161);河北省高等学校科学技术研究项目(ZD2022100, QN2017132);中央引导地方科技发展资金资助项目(226Z0501G)
Cite this article:

Yun-zuo ZHANG,Wei GUO,Zhao-quan CAI,Wen-bo LI. Remote sensing image target detection combining multi-scale and attention mechanism. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2215-2223.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.11.012     OR     https://www.zjujournals.com/eng/Y2022/V56/I11/2215


联合多尺度与注意力机制的遥感图像目标检测

遥感图像存在背景复杂、目标尺度差异大且密集分布等不足,为提高现有算法的检测效果提出联合多尺度与注意力机制的遥感图像目标检测算法. 改进空洞空间金字塔池化模块,增大不同尺寸图像的感受野;提出注意力模块用于学习特征图通道信息和空间位置信息,提升算法对复杂背景下遥感图像目标区域的特征提取能力;引入加权双向特征金字塔网络结构与主干网结合来增进多层次特征的融合;使用基于距离的非极大值抑制方法进行后处理,改善检测框易重叠的问题. 在DIOR和NWPUVHR-10数据集上的实验结果表明:所提算法的平均精度均值mAP分别达到71.6%和91.6%,相比于主流的YOLOv5s算法分别提升了2.9%和1.5%. 所提算法对复杂遥感图像取得了更好的检测效果.


关键词: 遥感图像,  目标检测,  YOLOv5s算法,  多尺度特征,  注意力模块,  特征融合,  非极大值抑制 
Fig.1 Network structure block diagram of joint multiscale and attention mechanism algorithm
Fig.2 ASPP+ module
Fig.3 Attention module
Fig.4 BiFPN structure diagram
Fig.5 Sample DIOR data set
Fig.6 DIOR data set analysis
类别 名称 类别 名称
C1 airplane C11 ground track field
C2 airport C12 harbor
C3 baseball field C13 overpass
C4 basketball court C14 ship
C5 bridge C15 stadium
C6 chimney C16 storage tank
C7 dam C17 tennis court
C8 expressway service area C18 train station
C9 expressway toll station C19 vehicle
C10 golf field C20 wind mill
Tab.1 DIOR data set category information
算法模型 mAP AP
C1/C11 C2/C12 C3/C13 C4/C14 C5/C15 C6/C16 C7/C17 C8/C18 C9/C19 C10/C20
RetinaNet[22] 65.7 53.7/74.2 77.3/50.7 69.0/59.6 81.3/71.2 44.1/69.3 72.3/44.8 62.5/81.3 76.2/54.2 66.0/45.1 77.7/83.4
PANet[23] 66.1 60.2/73.4 72.0/45.3 70.6/56.9 80.5/71.7 43.6/70.4 72.3/62.0 61.4/80.9 72.1/57.0 66.7/47.2 72.0/84.5
CBD-E[24] 67.8 54.2/79.5 77.0/47.5 71.5/59.3 87.1/69.1 44.6/69.7 75.4/64.3 63.5/84.5 76.2/59.4 65.3/44.7 79.3/83.1
YOLOv5s 68.7 78.3/73.1 65/58.3 74.3/57.4 90.6/91.8 44.3/67.9 80.1/82.7 48.9/89.1 57.7/49.7 63.2/55.4 68.6/78.1
Ours 71.6 85.8/75.7 74.2/59.9 78.9/58.6 89.8/89.7 46.1/71.9 77.8/78.7 60.5/89.5 65.1/55.4 65.3/56.4 75.6/78.1
Tab.2 Comparison of different algorithm models on DIOR test set %
Fig.7 Comparison of detection effect between YOLOv5s algorithm and proposed algorithm
算法模型 mAP AP
airplane ship storage tank baseball diamond tennis court basketball court ground track field harbor bridge vehicle
RetinaNet[22] 84.3 91.2 82.8 88.5 93.8 83.0 85.9 79.4 73.5 78.8 86.0
文献[25] 83.8 90.2 86.2 90.1 96.7 89.8 68.5 91.0 81.4 63.9 79.2
文献[26] 84.8 93.0 84.5 87.1 92.8 82.0 89.0 78.0 76.0 81.0 84.5
YOLOv5s 90.1 94.6 90.3 81.8 92.2 90.5 88.7 99.5 93.1 82.1 88.2
Ours 91.6 95.3 91.9 88.7 95.8 91.2 88.5 99.5 92.4 85.1 87.6
Tab.3 Comparison of different algorithm models on NWPU VHR-10 test set %
Baseline ASPP ASPP+
(1,3,5)
ASPP+
(3,6,9)
ASPP+
(6,12,18)
CBAM AM mAP/%
68.7
68.8
69.1
70.3
69.8
69.2
70.9
Tab.4 Performance comparison of ASPP+ and attention module in terms of mAP
模型 P /
%
R /
%
mAP /
%
FPS /
(frame·s?1)
YOLOv5s 65.3 70.2 68.7 28.1
YOLOv5s-ASPP+ 64.4 71.0 70.3 27.4
YOLOv5s-ASPP+-AM 63.7 72.2 70.9 25.9
YOLOv5s-ASPP+-AM-BiFPN 67.0 72.5 71.6 25.4
Tab.5 Experimental results after adding each module
[1]   姜鑫, 陈武雄, 聂海涛, 等 航空遥感图像的实时舰船目标检[J]. 光学精密工程, 2020, 28 (10): 2360- 2369
JIANG Xin, CHEN Wu-xiong, NIE Hai-tao, et al Real-time ships target detection based on aerial remote sensing images[J]. Optics and Precision Engineering, 2020, 28 (10): 2360- 2369
doi: 10.37188/OPE.20202810.2360
[2]   聂光涛, 黄华 光学遥感图像目标检测算法综述[J]. 自动化学报, 2021, 47 (8): 1749- 1768
NIE Guang-tao, HUANG Hua A survey of object detection in optical remote sensing images[J]. Acta Automatica Sinica, 2021, 47 (8): 1749- 1768
doi: 10.16383/j.aas.c200596
[3]   王昶, 张永生, 王旭, 等 基于深度学习的遥感影像变化检测方法[J]. 浙江大学学报:工学版, 2020, 54 (11): 2138- 2148
WANG Chang, ZHANG Yong-sheng, WANG Xu, et al Remote sensing image change detection method based on deep neural networks[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (11): 2138- 2148
[4]   YANG X, SUN H, SUN X, et al Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network[J]. IEEE Access, 2018, 6: 50839- 50849
doi: 10.1109/ACCESS.2018.2869884
[5]   FENG J, LIANG Y P, YE Z W, et al. Small object detection in optical remote sensing video with motion guided R-CNN [C]// IEEE International Geoscience and Remote Sensing Symposium. Waikoloa: IEEE, 2020: 272-275.
[6]   GUAN H Y, YU Y T, LI D L, et al Road Caps FPN: capsule feature pyramid network for road extraction from VHR optical remote sensing imagery[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 1- 11
[7]   COURTRAI L, PHAM M T, LEFEVRE S Small object detection in remote sensing images based on super-resolution with auxiliary generative adversarial networks[J]. Remote Sensing, 2020, 12 (19): 3152
doi: 10.3390/rs12193152
[8]   ZHANG X D, ZHU K, CHEN G Z, et al Geospatial object detection on high resolution remote sensing imagery based on double multi-scale feature pyramid network[J]. Remote Sensing, 2019, 11 (7): 755
doi: 10.3390/rs11070755
[9]   LI L L, CHENG L, GUO X H, et al. Deep adaptive proposal network in optical remote sensing images objective detection [C]// IEEE International Geoscience and Remote Sensing Symposium. Waikoloa: IEEE, 2020: 2651-2654.
[10]   CHEN C Y, GONG W G, CHEN Y L, et al Object detection in remote sensing images based on a scene-contextual feature pyramid network[J]. Remote Sensing, 2019, 11 (3): 339
doi: 10.3390/rs11030339
[11]   HE W P, HUANG Z, WEI Z F, et al TF-YOLO: an improved incremental network for real-time object detection[J]. Applied Sciences, 2019, 9 (16): 3225
doi: 10.3390/app9163225
[12]   SHAMSOLMOALI P, CHANUSSOT J, ZAREAPOOR M, et al Multi-patch feature pyramid network for weakly supervised object detection in optical remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1- 13
[13]   CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 834- 848
doi: 10.1109/TPAMI.2017.2699184
[14]   BERTASIUS G, TORRESANI L, YU S X, et al. Convolutional random walk networks for semantic image segmentation [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 858-866.
[15]   CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2022-01-14]. https://arxiv.53yu. com/abs/1706.05587v3.
[16]   HU J, SHEN L, SUN G. Squeeze-and-excitation network [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: Computer Vision Foundation, 2018: 7132-7141.
[17]   WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module [C]// European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
[18]   周勇, 陈思霖, 赵佳琦, 等 基于弱语义注意力的遥感图像可解释目标检测[J]. 电子学报, 2021, 49 (4): 679- 689
ZHOU Yong, CHEN Si-lin, ZHAO Jia-qi, et al Weakly semantic based attention network for interpretable object detection in remote sensing imagery[J]. Acta Electronica Sinica, 2021, 49 (4): 679- 689
doi: 10.12263/DZXB.20200554
[19]   ZHANG Y N, KONG J, QI M, et al Object detection based on multiple information fusion net[J]. Applied Sciences, 2020, 10 (1): 418
doi: 10.3390/app10010418
[20]   TAN M X, PANG R M, LE Q V. Efficientdet: scalable and efficient object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: Computer Vision Foundation, 2020: 10778-10787.
[21]   LI K, WAN G, CHENG G, et al Object detection in optical remote sensing images: a survey and a new benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 159: 296- 307
doi: 10.1016/j.isprsjprs.2019.11.023
[22]   LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
[23]   LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: Computer Vision Foundation, 2018: 8759-8768.
[24]   ZHANG J, XIE C M, XU X, et al A contextual bidirectional enhancement method for remote sensing image object detection[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 4518- 4531
doi: 10.1109/JSTARS.2020.3015049
[25]   WANG C, BAI X, WANG S A, et al Multiscale visual attention networks for object detection in VHR remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2018, 16 (2): 310- 314
[1] Jin-zhen LIU,Fei CHEN,Hui XIONG. Open electrical impedance imaging algorithm based on multi-scale residual network model[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1789-1795.
[2] Ren-peng MO,Xiao-sheng SI,Tian-mei LI,Xu ZHU. Bearing life prediction based on multi-scale features and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1447-1456.
[3] Guo-hua ZHOU,Jian-wei LU,Tong-guang NI,Xue-long HU. Hierarchical nonlinear subspace dictionary learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1159-1167.
[4] Ze-kang WU,Shan ZHAO,Hong-wei LI,Yi-rui JIANG. Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 795-802.
[5] Na ZHANG,Xu-lei QI,Xiao-an BAO,Biao WU,Xiao-mei TU,Yu-ting JIN. Single-stage object detection algorithm based on optimizing position prediction[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 783-794.
[6] Xin-yu HUANG,Fan YOU,Pei ZHANG,Zhao ZHANG,Bai-li ZHANG,Jian-hua LV,Li-zhen XU. Silent liveness detection algorithm based on multi classification and feature fusion network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 263-270.
[7] Rong ZHANG,Wei ZHANG. Fire detection algorithm based on improved GhostNet-FCOS[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1891-1899.
[8] Dong-jie YANG,Xian-jun GAO,Shu-hao RAN,Guang-bin ZHANG,Ping WANG,Yuan-wei YANG. Building extraction based on multiple multiscale-feature fusion attention network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1924-1934.
[9] Zhi-chao CHEN,Hai-ning JIAO,Jie YANG,Hua-fu ZENG. Garbage image classification algorithm based on improved MobileNet v2[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1490-1499.
[10] Jin-hai ZHOU,Shi-yi ZHOU,Yang CHANG,Geng-jun WU,Yi-chuan WANG. Multi-human target tracking based on baseband signals of ultra wide band radar[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1208-1214.
[11] Li-feng XU,Hai-fan HUANG,Wei-long DING,Yu-lei FAN. Detection of small fruit target based on improved DenseNet[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 377-385.
[12] Hao-yuan WANG,Yu LIANG,Wei ZHANG. Real-time smoke segmentation algorithm fused with multi-resolution representation[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2334-2341.
[13] Yue-lin CHEN,Wen-jing TIAN,Xiao-dong CAI,Shu-ting ZHENG. Text matching model based on dense connection networkand multi-dimensional feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2352-2358.
[14] Xue-yun CHEN,Jin XIA,Ke DU. Overhead transmission line detection based on multiple linear-feature enhanced detector[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2382-2389.
[15] Qing-qing LIU,Zhi-yong ZHOU,Guo-hua FAN,Xu-sheng QIAN,Ji-su HU,Guang-qiang CHEN,Ya-kang DAI. Semi-supervised learning segmentation method of liver CT images based on 3D scSE-UNet[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(11): 2033-2044.