Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (4): 763-771    DOI: 10.3785/j.issn.1008-973X.2026.04.008
    
Small object detection algorithm for optical remote sensing images based on fusion attention mechanism
Yaolian SONG1(),Chi PENG1,Jingmin TANG1,*(),Xuanzhi ZHAO1,Guicai YU2
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
2. School of Physics and Electronic Information Engineering, Qinghai Minzu University, Xining 810007, China
Download: HTML     PDF(2536KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A small object detection algorithm FMCM-YOLO based on feature enhancement and fusion attention mechanism was proposed, aiming at the challenges of limited feature extraction, foreground-background confusion, and severe missed and false detections in small object detection in optical remote sensing images. Firstly, a four-head detection model was designed and a small target detection layer was added to detect numerous small objects in optical remote sensing images. Secondly, a feature enhancement module was proposed in the backbone network, which improved feature extraction capability by designing a multi-branch convolutional structure and introducing dilated convolution of different sizes. Thirdly, channel and spatial attention mechanisms were incorporated into the neck network, and a residual structure was introduced to focus on small objects, facilitating the distinction between targets and backgrounds. Finally, MPDIoU was adopted as the model’s loss function to accelerate convergence and enhance detection performance for small objects. Experimental results demonstrated that the mAP50 of the proposed algorithm on the two public datasets, USOD and AI-TOD, reached 89.9% and 60.6% respectively, which were 2.8 and 5.9 percentage points higher than those of the baseline algorithm YOLOv5m. Especially, the mean average precision for extremely tiny, tiny, and small objects increased by 2.1, 6.5, and 5.1 percentage points, respectively. These results proved that the FMCM-YOLO algorithm effectively improved the detection performance of small targets in optical remote sensing images.



Key wordsoptical remote sensing image      small target detection      YOLOv5      feature enhancement      attention mechanism     
Received: 26 July 2025      Published: 19 March 2026
CLC:  TP 753  
Fund:  国家自然科学基金资助项目(62261056);国防科技重点实验室基金资助项目(23JCJQLB3301);汉江国际国家实验室开放基金资助项目(KF2024025);教育部产学合作协同育人项目(231107173102719).
Corresponding Authors: Jingmin TANG     E-mail: 39217149@qq.com;tang_min213@163.com
Cite this article:

Yaolian SONG,Chi PENG,Jingmin TANG,Xuanzhi ZHAO,Guicai YU. Small object detection algorithm for optical remote sensing images based on fusion attention mechanism. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 763-771.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.04.008     OR     https://www.zjujournals.com/eng/Y2026/V60/I4/763


基于融合注意力机制的光学遥感图像小目标检测算法

针对光学遥感图像中小目标检测特征提取受限、前背景混淆、漏检误检严重等问题,提出基于特征增强和融合注意力机制的小目标检测算法FMCM-YOLO. 设计四头检测模型,添加小目标检测层,用于检测光学遥感图像中众多小目标;在主干网络中提出特征增强模块,通过设计多分支卷积结构引入不同尺寸的空洞卷积,提高特征提取能力;在颈部网络中融合通道和空间注意力机制,并引入残差结构聚焦小目标,更易区分目标和背景;将MPDIoU作为模型损失函数,提升收敛速度,增强对小目标的检测能力. 实验结果表明,所提算法在USOD和AI-TOD这2个公开数据集上的mAP50分别达到89.9%和60.6%,相较于基线算法YOLOv5m分别提高了2.8和5.9个百分点,非常微小、微小和小目标的平均均值精度分别提升了2.1、6.5和5.1个百分点,可以看出FMCM-YOLO算法有效提升了光学遥感图像中小目标的检测性能.


关键词: 光学遥感图像,  小目标检测,  YOLOv5,  特征增强,  注意力机制 
Fig.1 Structure of FMCM-YOLO
Fig.2 Structure of MFFM
Fig.3 Structure of CASABlock
Fig.4 Structure of CASA
Fig.5 Parameters of MPDIoU loss function
Fig.6 Size distribution of target instance in dataset
参数数值参数数值
批次大小16权重衰减系数0.005
训练轮次300学习率动量0.937
初始学习率0.01图片尺寸640×640
Tab.1 Model training hyperparameter settings
序号小目标层MFFMCASAMPDIoUP/%R/%mAP50/%mAP50:95/%Params/106GFLOPs
A88.581.587.131.920.8547.9
B89.782.087.932.721.2956.3
C89.583.287.831.921.6252.1
D91.482.488.633.320.9248.3
E89.983.188.232.820.8547.9
F91.784.089.133.022.1362.3
G90.783.088.732.722.1362.2
H91.383.389.032.921.3356.4
I91.583.789.332.921.6551.8
J92.384.189.934.122.1362.3
Tab.2 Analysis of ablation experiment results with different combinations of improvement points
模型P/%R/%mAP50/%mAP50:95/%Params/106FPS
RefineDet88.182.485.131.435.6832
YOLOv5m88.581.587.131.920.85258
YOLOv8m90.582.287.632.429.74155
TPH-YOLOv591.083.789.532.145.36134
MSFE-YOLO-m91.683.589.633.159.5137
LS-YOLO90.883.689.333.922.6153
L-FFCA-YOLO91.382.889.333.25.10165
FMCM-YOLO(本研究算法)92.384.189.934.122.13169
Tab.3 Performance comparison results of different algorithms on USOD
模型mAP50/%mAP50:95/%mAPvt/%mAPt/%mAPs/%FPS
DedectoRS32.814.8010.828.361
M-CenterNet40.714.56.115.019.478
YOLOv5m54.721.710.522.127.0258
HANet53.722.110.922.227.3178
FFCA-YOLO61.727.712.624.931.8171
L-FFCA-YOLO58.325.511.723.230.1165
FMCM-YOLO(本研究算法)60.626.712.628.632.1169
Tab.4 Performance comparison results of different algorithms on AI-TOD
类别mAP50/%
FMCM-YOLOYOLOv5m
all60.654.7
airplane66.964.3
bridge50.444.9
storage-tank88.977.9
ship78.975.0
swimming-pool51.751.2
vehicle77.769.9
person39.231.3
wind-mill33.423.3
Tab.5 Comparison of detection performance of various types of targets in AI-TOD dataset before and after improvement
Fig.7 Comparison of effects of different loss functions
Fig.8 Comparison of visual detection effects before and after model improvement
[1]   许夙晖, 慕晓冬, 柯冰, 等 基于遥感影像的军事阵地动态监测技术研究[J]. 遥感技术与应用, 2014, 29 (3): 511- 516
XU Suhui, MU Xiaodong, KE Bing, et al Dynamic monitoring of military position based on remote sensing image[J]. Remote Sensing Technology and Application, 2014, 29 (3): 511- 516
doi: 10.11873/j.issn.1004-0323.2014.3.0511
[2]   姚艳清, 程塨, 谢星星, 等 多分辨率特征融合的光学遥感图像目标检测[J]. 遥感学报, 2021, 25 (5): 1124- 1137
YAO Yanqing, CHENG Gong, XIE Xingxing, et al Optical remote sensing image object detection based on multi-resolution feature fusion[J]. National Remote Sensing Bulletin, 2021, 25 (5): 1124- 1137
[3]   禹文奇, 程塨, 王美君, 等 MAR20: 遥感图像军用飞机目标识别数据集[J]. 遥感学报, 2023, 27 (12): 2688- 2696
YU Wenqi, CHENG Gong, WANG Meijun, et al MAR20: a benchmark for military aircraft recognition in remote sensing images[J]. National Remote Sensing Bulletin, 2023, 27 (12): 2688- 2696
doi: 10.11834/jrs.20222139
[4]   GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580–587.
[5]   REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779–788.
[6]   LIU Z, GAO X, WAN Y, et al An improved YOLOv5 method for small object detection in UAV capture scenes[J]. IEEE Access, 2023, 11: 14365- 14374
doi: 10.1109/ACCESS.2023.3241005
[7]   QIU Y, SHA F, NIU L DKA-YOLO: enhanced small object detection via dilation kernel aggregation convolution modules[J]. IEEE Access, 2024, 12: 187353- 187366
doi: 10.1109/ACCESS.2024.3515201
[8]   许思源, 吴伟林 多尺度特征融合的遥感图像目标检测算法研究[J]. 计算机工程与应用, 2024, 60 (23): 249- 256
XU Siyuan, WU Weilin Research on object detection algorithm for remote sensing images based on multi-scale fea-ture fusion[J]. Computer Engineering and Applications, 2024, 60 (23): 249- 256
[9]   CAI X, LAI Q, WANG Y, et al. Poly kernel inception network for remote sensing detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 27706–27716.
[10]   吴建成, 郭荣佐, 成嘉伟, 等 注意力特征融合的快速遥感图像目标检测算法[J]. 计算机工程与应用, 2024, 60 (1): 207- 216
WU Jiancheng, GUO Rongzuo, CHENG Jiawei, et al Fast remote sensing image object detection algorithm based on attention feature fusion[J]. Computer Engineering and Applications, 2024, 60 (1): 207- 216
doi: 10.3778/j.issn.1002-8331.2303-0375
[11]   汪西莉, 梁正印, 刘涛 基于特征注意力金字塔的遥感图像目标检测方法[J]. 遥感学报, 2023, 27 (2): 492- 501
WANG Xili, LIANG Zhengyin, LIU Tao Feature attention pyramid-based remote sensing image object detection method[J]. National Remote Sensing Bulletin, 2023, 27 (2): 492- 501
[12]   HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132–7141.
[13]   WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// ECCV 2018. Munich: Springer, 2018: 3–19.
[14]   MA S, XU Y. MPDIoU: a loss for efficient and accurate bounding box regression [EB/OL]. (2023–07–14) [2025–07–15]. https://doi.org/10.48550/arXiv.2307.07662.
[15]   ZHANG Y, YE M, ZHU G, et al FFCA-YOLO for small object detection in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5611215
doi: 10.1109/tgrs.2024.3363057
[16]   WANG J, YANG W, GUO H, et al. Tiny object detection in aerial images [C]// 25th International Conference on Pattern Recognition. Milan: IEEE, 2021: 3791–3798.
[17]   FU C Y, LIU W, RANGA A, et al. Dssd: deconvolutional single shot detector [EB/OL]. (2017–01–23) [2025–07–17]. https://doi.org/10.48550/arXiv.1701.06659.
[18]   JOCHER G, CHAURASIA A, QIU J. Ultralytics YOLOv8. [EB/OL]. (2023–04–03) [2025–07–17]. https://github.com/pytholic/ultralytics-yolov8.
[19]   ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios [C]// IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021: 2778–2788.
[20]   QI S, SONG X, SHANG T, et al MSFE-YOLO: an improved YOLOv8 network for object detection on drone view[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 6013605
doi: 10.1109/lgrs.2024.3432536
[21]   ZHANG W, LIU Z, ZHOU S, et al LS-YOLO: a novel model for detecting multiscale landslides with remote sensing images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 4952- 4965
doi: 10.1109/JSTARS.2024.3363160
[22]   QIAO S, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 10208–10219.
[23]   GUO G, CHEN P, YU X, et al Save the tiny, save the all: hierarchical activation network for tiny object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (1): 221- 234
doi: 10.1109/TCSVT.2023.3284161
[24]   ZHENG Z, WANG P, REN D, et al Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2022, 52 (8): 8574- 8586
doi: 10.1109/TCYB.2021.3095305
[25]   ZHANG Y F, REN W, ZHANG Z, et al Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146- 157
doi: 10.1016/j.neucom.2022.07.042
[26]   GEVORGYAN Z. SIoU loss: More powerful learning for bounding box regression [EB/OL]. (2022–05–25) [2025–07–17]. https://doi.org/10.48550/arXiv.2205.12740.
[1] Wenqiang CHEN,Linyue FENG,Dongdan WANG,Yulei GU,Xuan ZHAO. Vehicle trajectory prediction model integrating dynamic risk map and multivariate attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 455-467.
[2] Congyu HU,Chenbo YIN,Wei MA,Chao YANG,Shikuan YAN. Object recognition of excavator operation based on improved CNN-LSTM[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 536-545.
[3] Binbin LI,Chao ZHANG,Tao QIN,Changsheng CHEN,Xingyan LIU,Jing YANG. Mobile-based human fall detection method for photovoltaic power plant construction[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 546-555.
[4] Guoyan LI,Penghui LI,Rong LIU,Yupeng MEI,Minghui ZHANG. Remote sensing road extraction by fusing multi-scale resolution and strip feature[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 585-593.
[5] Fang FANG,Jun YAN,Hongxiang GUO,Yong WANG. Lightweight brainprint recognition algorithm based on spatio-temporal attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 633-642.
[6] Shuang WANG,Xitai ZHANG,Yongcun GUO,Shousuo SUN. Demagnetization fault diagnosis of controllable hybrid magnetic couplers based on deep neural networks[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 279-286.
[7] Yuyu MENG,Chuile KONG,Jiuyuan HUO,Zeyu WU. UAV small target detection algorithm based on reconstruction of YOLOv11[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 303-312.
[8] Xianhua LI,Pengfei DU,Tao SONG,Xun QIU,Yu CAI. EEG signal classification based on multi-scale sliding-window attention temporal convolutional networks[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 370-378.
[9] Minghui YANG,Muyuan SONG,Daxi FU,Yanwei GUO,Xianzhui LU,Wencong ZHANG,Weilong ZHENG. Prediction of shield tunneling-induced soil settlement based on multi-head self-attention-Bi-LSTM model[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 415-424.
[10] Siyao ZHOU,Nan XIA,Jiahong JIANG. Pose-guided dual-branch network for clothing-changing person re-identification[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 71-80.
[11] Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG. Contrastive learning-based sound source localization-guided audio-visual segmentation model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1803-1813.
[12] Fujian WANG,Zetian ZHANG,Xiqun CHEN,Dianhai WANG. Usage prediction of shared bike based on multi-channel graph aggregation attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1986-1995.
[13] Zhuguo ZHOU,Yujun LU,Liye LV. Improved YOLOv5s-based algorithm for printed circuit board defect detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1608-1616.
[14] Xuejun ZHANG,Shubin LIANG,Wanrong BAI,Fenghe ZHANG,Haiyan HUANG,Meifeng GUO,Zhuo CHEN. Source code vulnerability detection method based on heterogeneous graph representation[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1644-1652.
[15] Yishan LIN,Jing ZUO,Shuhua LU. Multimodal sentiment analysis based on multi-head self-attention mechanism and MLP-Interactor[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1653-1661.