Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (1): 19-31    DOI: 10.3785/j.issn.1008-973X.2026.01.002
    
Aerial small target detection algorithm based on multi-scale feature enhancement
Jian XIAO1(),Xinze HE1,Hongliang CHENG1,Xiaoyuan YANG1,Xin HU2,*()
1. School of Electronics and Control Engineering, Chang’an University, Xi’an 710064, China
2. School of Energy and Electrical Engineering, Chang’an University, Xi’an 710064, China
Download: HTML     PDF(5429KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An aerial small target detection algorithm that balanced performance and resource consumption was proposed to address the issues of low detection accuracy and large model parameter size in small target detection of aerial images. On the basis of YOLOv8s, an adaptive detail-enhanced module (ADEM) was proposed by reducing the channel dimension and enhancing the focus on the high-frequency features to capture the fine-grained features of small targets while discarding the redundant information. A feature fusion network was optimized based on the PAN-FPN architecture to enhance the attention on shallow features. Multi-scale convolutional kernels were introduced to enhance the focus on the target contextual information, thereby adapting to the small object detection scenario. A parameter-adjustable Nin-IoU was constructed to overcome the limitations of traditional IoU in flexibility and generalization, and this adjustment achieved by introducing adjustable parameters allowed the Nin-IoU to be tailored to different detection tasks. A lightweight detection head was proposed to enhance the integration of multi-scale feature information while reducing redundant information transmission. Experimental results on the VisDrone2019 dataset indicated that the proposed algorithm achieved an mAP0.5 of 50.3% with only 8.08×106 parameters, representing a 27.4% reduction in parameters and an improvement of 11.5 percentage points in accuracy compared to the YOLOv8s benchmark algorithm. Experimental results on the DOTA and DIOR datasets further demonstrated the strong generalization capabilities of the proposed algorithm.



Key wordsobject detection      YOLOv8      unmanned aerial vehicle image      feature fusion      loss function     
Received: 25 November 2024      Published: 15 December 2025
CLC:  TP 391.4  
Fund:  陕西省秦创原“科学家+工程师”队伍建设项目(2024QCY-KXJ-161);西安市人工智能重点产业链项目(23ZDCYJSGG0013-2023).
Corresponding Authors: Xin HU     E-mail: xiaojian@-chd.edu.cn;huxin@chd.edu.cn
Cite this article:

Jian XIAO,Xinze HE,Hongliang CHENG,Xiaoyuan YANG,Xin HU. Aerial small target detection algorithm based on multi-scale feature enhancement. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 19-31.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.01.002     OR     https://www.zjujournals.com/eng/Y2026/V60/I1/19


基于多尺度特征增强的航拍小目标检测算法

针对航拍图像小目标检测中存在的检测精度低和模型参数量大的问题,提出兼顾性能与资源消耗的航拍小目标检测算法. 以YOLOv8s为基准网络,通过降低通道维数和加强对高频特征的关注,提出自适应细节增强模块(ADEM),在减少冗余信息的同时加强对小目标细粒度特征的捕获;基于PAN-FPN 架构调整特征融合网络,增加对浅层特征的关注,同时引入多尺度卷积核增强对目标上下文信息的关注,以适应小目标检测场景;针对传统IoU灵活性、泛化性不强的问题,构建参数可调的Nin-IoU,通过引入可调参数,实现对IoU的针对性调整,以适应不同检测任务的需求;提出轻量化检测头,在增强多尺度特征信息交融的同时减少冗余信息的传递. 结果表明,在VisDrone2019数据集上,所提算法以8.08×106的参数量实现了mAP0.5=50.3%的检测精度;相较于基准算法YOLOv8s,参数量降低了27.4%,精度提升了11.5个百分点. 在DOTA与DIOR数据集上的实验结果表明,所提算法具有较强的泛化能力.


关键词: 目标检测,  YOLOv8,  无人机图像,  特征融合,  损失函数 
Fig.1 Overall network architecture of small object detection algorithm based on multi-scale feature enhancement
Fig.2 Structure of adaptive detail-enhanced module
Fig.3 Conditional convolution structure
Fig.4 Detail-enhanced convolution structure
Fig.5 Multi-scale feature fusion network diagram
Fig.6 Structure diagram of SPDConv module
Fig.7 Structure diagram of CSPOmni module
Fig.8 Relationship between regression loss gradient and IoU
Fig.9 T-shaped perceptual field feature fusion module architecture
NratioPR mAP0.5mAP0.5∶0.95
01.10.6030.3750.4940.321
01.20.6070.3750.4950.322
01.30.6100.3750.4960.322
01.40.6130.3750.4970.323
01.50.6140.3750.4980.323
11.10.6280.3730.5020.325
11.20.6290.3720.5020.326
11.30.6300.3720.5020.326
11.40.6310.3720.5030.326
11.50.6320.3720.5030.326
51.10.6360.3690.5030.327
51.20.6370.3680.5030.326
51.30.6370.3670.5030.326
51.40.6370.3670.5020.326
51.50.6370.3660.5020.326
91.10.6370.3660.5020.326
91.20.6380.3650.5020.326
91.30.6380.3640.5010.326
91.40.6370.3620.5000.325
91.50.6380.3610.5000.325
151.10.6380.3630.5010.326
151.20.6380.3610.5000.325
151.30.6380.3590.4990.325
151.40.6390.3580.4980.324
151.50.6390.3560.4970.324
Tab.1 Results of Nin-IoU ablation experiment
模型 P R mAP0.5 mAP0.5∶0.95
IoU0.5990.3760.4920.320
N-IoU0.6340.3700.5000.324
Inner-IoU0.6030.3750.4940.321
Nin-IoU0.6360.3690.5030.326
Tab.2 Comparison of loss function experimental results
模块$ P_{{\mathrm{ara}}}^1$/106$ P_{{\mathrm{ara}}}^2$/106FLOPs/109
OKM2.119.5736.6
OKM+DC1.629.0830.3
OKM+CSP0.768.2125.4
CSPOmni0.648.0823.6
Tab.3 Results of CSPOmni ablation experiment
模型ADEMMFFNLTDHNin-IoUmAP0.5/%mAP0.5∶0.95/%APsmall0.5∶0.95/%Pre/%Para/106FLOPs/109F/(帧$ \cdot {{\mathrm{s}}^{ - 1}}$)
YOLOv8s38.823.212.449.911.1328.5117.3
(a)38.522.911.749.77.8519.2117.6
(b)43.826.614.553.112.0740.1102.1
(c)39.623.912.752.110.4121.2114.9
(d)44.929.113.758.411.1328.580.9
(e)43.526.113.852.88.7931.1105.7
(f)44.427.414.756.28.0823.693.5
(g)50.332.715.763.68.0823.657.3
Tab.4 Overall ablation results of small target detection algorithm based on multi-scale feature enhancement
模型AP/%mAP0.5/%F/(帧$ \cdot {{\mathrm{s}}^{ - 1}}$)Para/106
行人自行车汽车面包车卡车三轮车遮阳棚三轮车巴士摩托车
Faster R-CNN[20]20.914.87.351.029.719.514.08.830.521.221.814.4
YOLOv5s39.031.311.273.535.429.520.511.143.137.033.2118.07.03
TPH-YOLOv5[1]53.342.121.183.745.242.533.016.361.151.044.934.060.42
YOLOv7-tiny[21]37.934.69.476.136.329.820.110.643.241.834.089.06.03
YOLOv8s42.032.812.579.444.735.526.917.154.043.338.817.311.13
YOLOv8l51.139.821.982.949.345.438.120.467.052.346.859.043.69
YOLOv9-C[22]34.018.415.477.545.254.124.824.164.938.339.750.90
YOLOv11s41.631.811.279.545.435.526.115.555.143.338.5121.89.46
本研究模型53.9 48.327.581.353.649.340.024.571.952.850.357.38.08
Tab.5 Comparative results of different algorithms in average precision and parameters on VisDrone 2019 dataset
Fig.10 Comparison of YOLOv8s and proposed algorithm on target detection performance in complex scenes
Fig.11 Heatmap visualization results of YOLOv8s and proposed algorithm in different scenarios
模型DIORDOTA
P/%R/%mAP0.5/%P/%R/%mAP0.5/%
YOLOv8s81.571.973.771.640.844.0
本研究模型81.074.177.173.640.156.7
Tab.6 Generalization performance across diverse scenarios
Fig.12 Comparison of generalization performance between YOLOv8s and proposed algorithm in different scenarios
[17]   ZHANG H, XU C, ZHANG S J. Inner-IoU: more effective intersection over union loss with auxiliary bounding box [EB/OL]. (2023−11−14) [2024−11−20]. https://arxiv.org/abs/2311.02877.
[18]   TIAN Z, SHEN C, CHEN H, et al FCOS: a simple and strong anchor-free object detector[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (4): 1922- 1933
[19]   DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop. Seoul: IEEE, 2019: 213−226.
[20]   YU W, YANG T, CHEN C. Towards resolving the challenge of long-tail distribution in UAV images for object detection [C]// Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 3257−3266.
[21]   WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 7464−7475.
[22]   WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information [C]// European Conference on Computer Vision. Milan: Springer, 2025: 1−21.
[23]   SELVARAJU R R, COGSWELL M, DAS A, et al Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128 (2): 336- 359
doi: 10.1007/s11263-019-01228-7
[24]   XIA G S, BAI X, DING J, et al. DOTA: a large- scale dataset for object detection in aerial images [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3974−3983.
[1]   ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021: 2778−2788.
[2]   LUO X, WU Y, WANG F Target detection method of UAV aerial imagery based on improved YOLOv5[J]. Remote Sensing, 2022, 14 (19): 5063
doi: 10.3390/rs14195063
[3]   宋耀莲, 王粲, 李大焱, 等 基于改进YOLOv5s的无人机小目标检测算法[J]. 浙江大学学报: 工学版, 2024, 58 (12): 2417- 2426
SONG Yaolian, WANG Can, LI Dayan, et al UAV small target detection algorithm based on improved YOLOv5s[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (12): 2417- 2426
[25]   LI K, WAN G, CHENG G, et al Object detection in optical remote sensing images: a survey and a new benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 159: 296- 307
[4]   邓天民, 余洋, 陈月田, 等. 基于自适应复合卷积的航拍小目标检测算法[J/OL]. 北京航空航天大学学报, 2024: 1–14. (2024−06−19) [2024−11−19]. https://doi.org/10.13700/j.bh.100-5965.2024.0135.
DENG Tianmin, YU Yang, CHEN Yuetian, et al. Small object detection algorithm for aerial photography based on adaptive compound convolution [J/OL]. Journal of Beijing University of Aeronautics and Astronautics, 2024: 1–14. (2024−06−19) [2024−11−19]. https://doi.org/10.13700/j.bh.100-5965.2024.0135.
[5]   CAO J, BAO W, SHANG H, et al GCL-YOLO: a GhostConv-based lightweight YOLO network for UAV small object detection[J]. Remote Sensing, 2023, 15 (20): 4932
doi: 10.3390/rs15204932
[6]   WANG H, LIU C, CAI Y, et al YOLOv8-QSD: an improved small object detection algorithm for autonomous vehicles based on YOLOv8[J]. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 2513916
[7]   FENG F, HU Y, LI W, et al Improved YOLOv8 algorithms for small object detection in aerial imagery[J]. Journal of King Saud University-Computer and Information Sciences, 2024, 36 (6): 102113
doi: 10.1016/j.jksuci.2024.102113
[8]   BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS: improving object detection with one line of code [C]// Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5562−5570.
[9]   CHEN J, KAO SH, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 12021−12031.
[10]   YANG B, BENDER G, LE Q V, et al. CondConv: conditionally parameterized convolutions for efficient inference [C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: NeurIPS Foundation, 2020: 1296−1307.
[11]   CHEN Z, HE Z, LU Z M DEA-net: single image dehazing based on detail-enhanced convolution and content-guided attention[J]. IEEE Transactions on Image Processing, 2024, 33: 1002- 1015
doi: 10.1109/TIP.2024.3354108
[12]   LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936−944.
[13]   LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation [C]// Proceedings of the IEEE CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8759−8768.
[14]   SUNKARA R, LUO T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects [C]// Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Grenoble: Springer, 2023: 443−459.
[15]   CUI Y, REN W, KNOLL A Omni-kernel network for image restoration[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38 (2): 1426- 1434
doi: 10.1609/aaai.v38i2.27907
[1] Wenxin CHENG,Guanghui YAN,Wenwen CHANG,Baijing WU,Yaning HUANG. Channel-weighted multimodal feature fusion for EEG-based fatigue driving detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1775-1783.
[2] Chaoqun DONG,Zhan WANG,Ping LIAO,Shuai XIE,Yujie RONG,Jingsong ZHOU. Lightweight YOLOv5s-OCG rail sleeper crack detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1838-1845.
[3] Zhuguo ZHOU,Yujun LU,Liye LV. Improved YOLOv5s-based algorithm for printed circuit board defect detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1608-1616.
[4] Yahong ZHAI,Yaling CHEN,Longyan XU,Yu GONG. Improved YOLOv8s lightweight small target detection algorithm of UAV aerial image[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1708-1717.
[5] Jiarui FU,Zhaofei LI,Hao ZHOU,Wei HUANG. Camouflaged object detection based on Convnextv2 and texture-edge guidance[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1718-1726.
[6] Jingyao HE,Pengfei LI,Chengzhi WANG,Zhenming LV,Ping MU. Dynamic 3D reconstruction method using binocular vision and improved YOLOv8[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1443-1450.
[7] Huizhi XU,Xiuqing WANG. Perception of distance and speed of front vehicle based on vehicle image features[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1219-1232.
[8] Ming CAO,Wufeng DUAN,Mengxiao MA,Fanrong AI,Kui ZHOU. Uniformity evaluation of bio-printer based on improved YOLOv8-Seg model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1277-1283.
[9] Ziran ZHANG,Qiang LI,Xin GUAN. Classification network for chest disease based on convolution-assisted self-attention[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 890-901.
[10] Shenchong LI,Xinhua ZENG,Chuanqu LIN. Multi-task environment perception algorithm for autonomous driving based on axial attention[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 769-777.
[11] Zhenli ZHANG,Xinkai HU,Fan LI,Zhicheng FENG,Zhichao CHEN. Semantic segmentation algorithm for multiscale remote sensing images based on CNN and Efficient Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 778-786.
[12] Dengfeng LIU,Wenjing GUO,Shihai CHEN. Content-guided attention-based lane detection network[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 451-459.
[13] Liming LIANG,Pengwei LONG,Jiaxin JIN,Renjie LI,Lu ZENG. Steel surface defect detection algorithm based on improved YOLOv8s[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 512-522.
[14] Hongzhao DONG,Shaoxuan LIN,Yini SHE. Research progress of YOLO detection technology for traffic object[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 249-260.
[15] Yongfu HE,Shiwei XIE,Jialu YU,Siyu CHEN. Detection method for spillage risk vehicle considering cross-level feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 300-309.