Small object detection algorithm for optical remote sensing images based on fusion attention mechanism

doi:10.3785/j.issn.1008-973X.2026.04.008

Journal of ZheJiang University (Engineering Science)

2026, Vol. 60

Issue (4): 763-771 DOI: 10.3785/j.issn.1008-973X.2026.04.008

Small object detection algorithm for optical remote sensing images based on fusion attention mechanism

Yaolian SONG1(

),Chi PENG1,Jingmin TANG1,*(

),Xuanzhi ZHAO1,Guicai YU2

1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
2. School of Physics and Electronic Information Engineering, Qinghai Minzu University, Xining 810007, China

Download:

HTML

PDF(2536KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A small object detection algorithm FMCM-YOLO based on feature enhancement and fusion attention mechanism was proposed, aiming at the challenges of limited feature extraction, foreground-background confusion, and severe missed and false detections in small object detection in optical remote sensing images. Firstly, a four-head detection model was designed and a small target detection layer was added to detect numerous small objects in optical remote sensing images. Secondly, a feature enhancement module was proposed in the backbone network, which improved feature extraction capability by designing a multi-branch convolutional structure and introducing dilated convolution of different sizes. Thirdly, channel and spatial attention mechanisms were incorporated into the neck network, and a residual structure was introduced to focus on small objects, facilitating the distinction between targets and backgrounds. Finally, MPDIoU was adopted as the model’s loss function to accelerate convergence and enhance detection performance for small objects. Experimental results demonstrated that the mAP50 of the proposed algorithm on the two public datasets, USOD and AI-TOD, reached 89.9% and 60.6% respectively, which were 2.8 and 5.9 percentage points higher than those of the baseline algorithm YOLOv5m. Especially, the mean average precision for extremely tiny, tiny, and small objects increased by 2.1, 6.5, and 5.1 percentage points, respectively. These results proved that the FMCM-YOLO algorithm effectively improved the detection performance of small targets in optical remote sensing images.

Key words： optical remote sensing image small target detection YOLOv5 feature enhancement attention mechanism

Received: 26 July 2025 Published: 19 March 2026

CLC:

TP 753

Fund: 国家自然科学基金资助项目(62261056)；国防科技重点实验室基金资助项目(23JCJQLB3301)；汉江国际国家实验室开放基金资助项目(KF2024025)；教育部产学合作协同育人项目（231107173102719).

Corresponding Authors: Jingmin TANG E-mail: 39217149@qq.com;tang_min213@163.com

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Yaolian SONG
	Chi PENG
	Jingmin TANG
	Xuanzhi ZHAO
	Guicai YU

Cite this article:

Yaolian SONG,Chi PENG,Jingmin TANG,Xuanzhi ZHAO,Guicai YU. Small object detection algorithm for optical remote sensing images based on fusion attention mechanism. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 763-771.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.04.008 OR https://www.zjujournals.com/eng/Y2026/V60/I4/763

基于融合注意力机制的光学遥感图像小目标检测算法

针对光学遥感图像中小目标检测特征提取受限、前背景混淆、漏检误检严重等问题，提出基于特征增强和融合注意力机制的小目标检测算法FMCM-YOLO. 设计四头检测模型，添加小目标检测层，用于检测光学遥感图像中众多小目标；在主干网络中提出特征增强模块，通过设计多分支卷积结构引入不同尺寸的空洞卷积，提高特征提取能力；在颈部网络中融合通道和空间注意力机制，并引入残差结构聚焦小目标，更易区分目标和背景；将MPDIoU作为模型损失函数，提升收敛速度，增强对小目标的检测能力. 实验结果表明，所提算法在USOD和AI-TOD这2个公开数据集上的mAP50分别达到89.9%和60.6%，相较于基线算法YOLOv5m分别提高了2.8和5.9个百分点，非常微小、微小和小目标的平均均值精度分别提升了2.1、6.5和5.1个百分点，可以看出FMCM-YOLO算法有效提升了光学遥感图像中小目标的检测性能.

关键词： 光学遥感图像, 小目标检测, YOLOv5, 特征增强, 注意力机制

Fig.1 Structure of FMCM-YOLO

Fig.2 Structure of MFFM

Fig.3 Structure of CASABlock

Fig.4 Structure of CASA

Fig.5 Parameters of MPDIoU loss function

Fig.6 Size distribution of target instance in dataset

Tab.1 Model training hyperparameter settings

Tab.2 Analysis of ablation experiment results with different combinations of improvement points

Tab.3 Performance comparison results of different algorithms on USOD

Tab.4 Performance comparison results of different algorithms on AI-TOD

Tab.5 Comparison of detection performance of various types of targets in AI-TOD dataset before and after improvement

Fig.7 Comparison of effects of different loss functions

Fig.8 Comparison of visual detection effects before and after model improvement


[1]	许夙晖, 慕晓冬, 柯冰, 等基于遥感影像的军事阵地动态监测技术研究[J]. 遥感技术与应用, 2014, 29 (3): 511- 516 XU Suhui, MU Xiaodong, KE Bing, et al Dynamic monitoring of military position based on remote sensing image[J]. Remote Sensing Technology and Application, 2014, 29 (3): 511- 516 doi: 10.11873/j.issn.1004-0323.2014.3.0511

[2]	姚艳清, 程塨, 谢星星, 等多分辨率特征融合的光学遥感图像目标检测[J]. 遥感学报, 2021, 25 (5): 1124- 1137 YAO Yanqing, CHENG Gong, XIE Xingxing, et al Optical remote sensing image object detection based on multi-resolution feature fusion[J]. National Remote Sensing Bulletin, 2021, 25 (5): 1124- 1137

[3]	禹文奇, 程塨, 王美君, 等 MAR20: 遥感图像军用飞机目标识别数据集[J]. 遥感学报, 2023, 27 (12): 2688- 2696 YU Wenqi, CHENG Gong, WANG Meijun, et al MAR20: a benchmark for military aircraft recognition in remote sensing images[J]. National Remote Sensing Bulletin, 2023, 27 (12): 2688- 2696 doi: 10.11834/jrs.20222139

[4]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580–587.

[5]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779–788.

[6]	LIU Z, GAO X, WAN Y, et al An improved YOLOv5 method for small object detection in UAV capture scenes[J]. IEEE Access, 2023, 11: 14365- 14374 doi: 10.1109/ACCESS.2023.3241005

[7]	QIU Y, SHA F, NIU L DKA-YOLO: enhanced small object detection via dilation kernel aggregation convolution modules[J]. IEEE Access, 2024, 12: 187353- 187366 doi: 10.1109/ACCESS.2024.3515201

[8]	许思源, 吴伟林多尺度特征融合的遥感图像目标检测算法研究[J]. 计算机工程与应用, 2024, 60 (23): 249- 256 XU Siyuan, WU Weilin Research on object detection algorithm for remote sensing images based on multi-scale fea-ture fusion[J]. Computer Engineering and Applications, 2024, 60 (23): 249- 256

[9]	CAI X, LAI Q, WANG Y, et al. Poly kernel inception network for remote sensing detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 27706–27716.

[10]	吴建成, 郭荣佐, 成嘉伟, 等注意力特征融合的快速遥感图像目标检测算法[J]. 计算机工程与应用, 2024, 60 (1): 207- 216 WU Jiancheng, GUO Rongzuo, CHENG Jiawei, et al Fast remote sensing image object detection algorithm based on attention feature fusion[J]. Computer Engineering and Applications, 2024, 60 (1): 207- 216 doi: 10.3778/j.issn.1002-8331.2303-0375

[11]	汪西莉, 梁正印, 刘涛基于特征注意力金字塔的遥感图像目标检测方法[J]. 遥感学报, 2023, 27 (2): 492- 501 WANG Xili, LIANG Zhengyin, LIU Tao Feature attention pyramid-based remote sensing image object detection method[J]. National Remote Sensing Bulletin, 2023, 27 (2): 492- 501

[12]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132–7141.

[13]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// ECCV 2018. Munich: Springer, 2018: 3–19.

[14]	MA S, XU Y. MPDIoU: a loss for efficient and accurate bounding box regression [EB/OL]. (2023–07–14) [2025–07–15]. https://doi.org/10.48550/arXiv.2307.07662.

[15]	ZHANG Y, YE M, ZHU G, et al FFCA-YOLO for small object detection in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5611215 doi: 10.1109/tgrs.2024.3363057

[16]	WANG J, YANG W, GUO H, et al. Tiny object detection in aerial images [C]// 25th International Conference on Pattern Recognition. Milan: IEEE, 2021: 3791–3798.

[17]	FU C Y, LIU W, RANGA A, et al. Dssd: deconvolutional single shot detector [EB/OL]. (2017–01–23) [2025–07–17]. https://doi.org/10.48550/arXiv.1701.06659.

[18]	JOCHER G, CHAURASIA A, QIU J. Ultralytics YOLOv8. [EB/OL]. (2023–04–03) [2025–07–17]. https://github.com/pytholic/ultralytics-yolov8.

[19]	ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios [C]// IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021: 2778–2788.

[20]	QI S, SONG X, SHANG T, et al MSFE-YOLO: an improved YOLOv8 network for object detection on drone view[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 6013605 doi: 10.1109/lgrs.2024.3432536

[21]	ZHANG W, LIU Z, ZHOU S, et al LS-YOLO: a novel model for detecting multiscale landslides with remote sensing images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 4952- 4965 doi: 10.1109/JSTARS.2024.3363160

[22]	QIAO S, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 10208–10219.

[23]	GUO G, CHEN P, YU X, et al Save the tiny, save the all: hierarchical activation network for tiny object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (1): 221- 234 doi: 10.1109/TCSVT.2023.3284161

[24]	ZHENG Z, WANG P, REN D, et al Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2022, 52 (8): 8574- 8586 doi: 10.1109/TCYB.2021.3095305

[25]	ZHANG Y F, REN W, ZHANG Z, et al Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146- 157 doi: 10.1016/j.neucom.2022.07.042

[26]	GEVORGYAN Z. SIoU loss: More powerful learning for bounding box regression [EB/OL]. (2022–05–25) [2025–07–17]. https://doi.org/10.48550/arXiv.2205.12740.

[1]	Wenqiang CHEN,Linyue FENG,Dongdan WANG,Yulei GU,Xuan ZHAO. Vehicle trajectory prediction model integrating dynamic risk map and multivariate attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 455-467.

[2]	Congyu HU,Chenbo YIN,Wei MA,Chao YANG,Shikuan YAN. Object recognition of excavator operation based on improved CNN-LSTM[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 536-545.

[3]	Binbin LI,Chao ZHANG,Tao QIN,Changsheng CHEN,Xingyan LIU,Jing YANG. Mobile-based human fall detection method for photovoltaic power plant construction[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 546-555.

[4]	Guoyan LI,Penghui LI,Rong LIU,Yupeng MEI,Minghui ZHANG. Remote sensing road extraction by fusing multi-scale resolution and strip feature[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 585-593.

[5]	Fang FANG,Jun YAN,Hongxiang GUO,Yong WANG. Lightweight brainprint recognition algorithm based on spatio-temporal attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 633-642.

[6]	Shuang WANG,Xitai ZHANG,Yongcun GUO,Shousuo SUN. Demagnetization fault diagnosis of controllable hybrid magnetic couplers based on deep neural networks[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 279-286.

[7]	Yuyu MENG,Chuile KONG,Jiuyuan HUO,Zeyu WU. UAV small target detection algorithm based on reconstruction of YOLOv11[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 303-312.

[8]	Xianhua LI,Pengfei DU,Tao SONG,Xun QIU,Yu CAI. EEG signal classification based on multi-scale sliding-window attention temporal convolutional networks[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 370-378.

[9]	Minghui YANG,Muyuan SONG,Daxi FU,Yanwei GUO,Xianzhui LU,Wencong ZHANG,Weilong ZHENG. Prediction of shield tunneling-induced soil settlement based on multi-head self-attention-Bi-LSTM model[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 415-424.

[10]	Siyao ZHOU,Nan XIA,Jiahong JIANG. Pose-guided dual-branch network for clothing-changing person re-identification[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 71-80.

[11]	Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG. Contrastive learning-based sound source localization-guided audio-visual segmentation model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1803-1813.

[12]	Fujian WANG,Zetian ZHANG,Xiqun CHEN,Dianhai WANG. Usage prediction of shared bike based on multi-channel graph aggregation attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1986-1995.

[13]	Zhuguo ZHOU,Yujun LU,Liye LV. Improved YOLOv5s-based algorithm for printed circuit board defect detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1608-1616.

[14]	Xuejun ZHANG,Shubin LIANG,Wanrong BAI,Fenghe ZHANG,Haiyan HUANG,Meifeng GUO,Zhuo CHEN. Source code vulnerability detection method based on heterogeneous graph representation[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1644-1652.

[15]	Yishan LIN,Jing ZUO,Shuhua LU. Multimodal sentiment analysis based on multi-head self-attention mechanism and MLP-Interactor[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1653-1661.

Viewed

Full text

Abstract

Cited

Shared

Discussed