Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (1): 19-31    DOI: 10.3785/j.issn.1008-973X.2026.01.002
计算机技术     
基于多尺度特征增强的航拍小目标检测算法
肖剑1(),何昕泽1,程鸿亮1,杨小苑1,胡欣2,*()
1. 长安大学 电子与控制工程学院,陕西 西安 710064
2. 长安大学 能源与电气工程学院,陕西 西安 710064
Aerial small target detection algorithm based on multi-scale feature enhancement
Jian XIAO1(),Xinze HE1,Hongliang CHENG1,Xiaoyuan YANG1,Xin HU2,*()
1. School of Electronics and Control Engineering, Chang’an University, Xi’an 710064, China
2. School of Energy and Electrical Engineering, Chang’an University, Xi’an 710064, China
 全文: PDF(5429 KB)   HTML
摘要:

针对航拍图像小目标检测中存在的检测精度低和模型参数量大的问题,提出兼顾性能与资源消耗的航拍小目标检测算法. 以YOLOv8s为基准网络,通过降低通道维数和加强对高频特征的关注,提出自适应细节增强模块(ADEM),在减少冗余信息的同时加强对小目标细粒度特征的捕获;基于PAN-FPN 架构调整特征融合网络,增加对浅层特征的关注,同时引入多尺度卷积核增强对目标上下文信息的关注,以适应小目标检测场景;针对传统IoU灵活性、泛化性不强的问题,构建参数可调的Nin-IoU,通过引入可调参数,实现对IoU的针对性调整,以适应不同检测任务的需求;提出轻量化检测头,在增强多尺度特征信息交融的同时减少冗余信息的传递. 结果表明,在VisDrone2019数据集上,所提算法以8.08×106的参数量实现了mAP0.5=50.3%的检测精度;相较于基准算法YOLOv8s,参数量降低了27.4%,精度提升了11.5个百分点. 在DOTA与DIOR数据集上的实验结果表明,所提算法具有较强的泛化能力.

关键词: 目标检测YOLOv8无人机图像特征融合损失函数    
Abstract:

An aerial small target detection algorithm that balanced performance and resource consumption was proposed to address the issues of low detection accuracy and large model parameter size in small target detection of aerial images. On the basis of YOLOv8s, an adaptive detail-enhanced module (ADEM) was proposed by reducing the channel dimension and enhancing the focus on the high-frequency features to capture the fine-grained features of small targets while discarding the redundant information. A feature fusion network was optimized based on the PAN-FPN architecture to enhance the attention on shallow features. Multi-scale convolutional kernels were introduced to enhance the focus on the target contextual information, thereby adapting to the small object detection scenario. A parameter-adjustable Nin-IoU was constructed to overcome the limitations of traditional IoU in flexibility and generalization, and this adjustment achieved by introducing adjustable parameters allowed the Nin-IoU to be tailored to different detection tasks. A lightweight detection head was proposed to enhance the integration of multi-scale feature information while reducing redundant information transmission. Experimental results on the VisDrone2019 dataset indicated that the proposed algorithm achieved an mAP0.5 of 50.3% with only 8.08×106 parameters, representing a 27.4% reduction in parameters and an improvement of 11.5 percentage points in accuracy compared to the YOLOv8s benchmark algorithm. Experimental results on the DOTA and DIOR datasets further demonstrated the strong generalization capabilities of the proposed algorithm.

Key words: object detection    YOLOv8    unmanned aerial vehicle image    feature fusion    loss function
收稿日期: 2024-11-25 出版日期: 2025-12-15
:  TP 391.4  
基金资助: 陕西省秦创原“科学家+工程师”队伍建设项目(2024QCY-KXJ-161);西安市人工智能重点产业链项目(23ZDCYJSGG0013-2023).
通讯作者: 胡欣     E-mail: xiaojian@-chd.edu.cn;huxin@chd.edu.cn
作者简介: 肖剑(1975—),男,副教授,博士,从事检测技术研究. orcid.org/0000-0003-0650-6099. E-mail:xiaojian@chd.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
肖剑
何昕泽
程鸿亮
杨小苑
胡欣

引用本文:

肖剑,何昕泽,程鸿亮,杨小苑,胡欣. 基于多尺度特征增强的航拍小目标检测算法[J]. 浙江大学学报(工学版), 2026, 60(1): 19-31.

Jian XIAO,Xinze HE,Hongliang CHENG,Xiaoyuan YANG,Xin HU. Aerial small target detection algorithm based on multi-scale feature enhancement. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 19-31.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.01.002        https://www.zjujournals.com/eng/CN/Y2026/V60/I1/19

图 1  多尺度特征增强小目标检测算法的整体网络结构
图 2  自适应细节增强模块结构
图 3  条件卷积结构
图 4  细节增强卷积结构
图 5  多尺度特征融合网络
图 6  SPDConv模块结构图
图 7  CSPOmni模块结构图
图 8  回归损失梯度与${\mathrm{ IoU}} $的对应关系
图 9  T型感知域特征融合模块结构
NratioPR mAP0.5mAP0.5∶0.95
01.10.6030.3750.4940.321
01.20.6070.3750.4950.322
01.30.6100.3750.4960.322
01.40.6130.3750.4970.323
01.50.6140.3750.4980.323
11.10.6280.3730.5020.325
11.20.6290.3720.5020.326
11.30.6300.3720.5020.326
11.40.6310.3720.5030.326
11.50.6320.3720.5030.326
51.10.6360.3690.5030.327
51.20.6370.3680.5030.326
51.30.6370.3670.5030.326
51.40.6370.3670.5020.326
51.50.6370.3660.5020.326
91.10.6370.3660.5020.326
91.20.6380.3650.5020.326
91.30.6380.3640.5010.326
91.40.6370.3620.5000.325
91.50.6380.3610.5000.325
151.10.6380.3630.5010.326
151.20.6380.3610.5000.325
151.30.6380.3590.4990.325
151.40.6390.3580.4980.324
151.50.6390.3560.4970.324
表 1  Nin-IoU消融实验结果
模型 P R mAP0.5 mAP0.5∶0.95
IoU0.5990.3760.4920.320
N-IoU0.6340.3700.5000.324
Inner-IoU0.6030.3750.4940.321
Nin-IoU0.6360.3690.5030.326
表 2  损失函数对比实验结果
模块$ P_{{\mathrm{ara}}}^1$/106$ P_{{\mathrm{ara}}}^2$/106FLOPs/109
OKM2.119.5736.6
OKM+DC1.629.0830.3
OKM+CSP0.768.2125.4
CSPOmni0.648.0823.6
表 3  CSPOmni消融实验结果
模型ADEMMFFNLTDHNin-IoUmAP0.5/%mAP0.5∶0.95/%APsmall0.5∶0.95/%Pre/%Para/106FLOPs/109F/(帧$ \cdot {{\mathrm{s}}^{ - 1}}$)
YOLOv8s38.823.212.449.911.1328.5117.3
(a)38.522.911.749.77.8519.2117.6
(b)43.826.614.553.112.0740.1102.1
(c)39.623.912.752.110.4121.2114.9
(d)44.929.113.758.411.1328.580.9
(e)43.526.113.852.88.7931.1105.7
(f)44.427.414.756.28.0823.693.5
(g)50.332.715.763.68.0823.657.3
表 4  多尺度特征增强小目标检测算法的总体消融实验结果
模型AP/%mAP0.5/%F/(帧$ \cdot {{\mathrm{s}}^{ - 1}}$)Para/106
行人自行车汽车面包车卡车三轮车遮阳棚三轮车巴士摩托车
Faster R-CNN[20]20.914.87.351.029.719.514.08.830.521.221.814.4
YOLOv5s39.031.311.273.535.429.520.511.143.137.033.2118.07.03
TPH-YOLOv5[1]53.342.121.183.745.242.533.016.361.151.044.934.060.42
YOLOv7-tiny[21]37.934.69.476.136.329.820.110.643.241.834.089.06.03
YOLOv8s42.032.812.579.444.735.526.917.154.043.338.817.311.13
YOLOv8l51.139.821.982.949.345.438.120.467.052.346.859.043.69
YOLOv9-C[22]34.018.415.477.545.254.124.824.164.938.339.750.90
YOLOv11s41.631.811.279.545.435.526.115.555.143.338.5121.89.46
本研究模型53.9 48.327.581.353.649.340.024.571.952.850.357.38.08
表 5  不同算法在VisDrone数据集上的平均精度和参数量对比
图 10  复杂场景中YOLOv8s和所提算法的目标检测效果对比
图 11  不同场景中YOLOv8s与所提算法的热力图可视化结果
模型DIORDOTA
P/%R/%mAP0.5/%P/%R/%mAP0.5/%
YOLOv8s81.571.973.771.640.844.0
本研究模型81.074.177.173.640.156.7
表 6  不同场景下的泛化性实验结果
图 12  不同场景中YOLOv8s与所提算法的泛化性实验效果对比
17 ZHANG H, XU C, ZHANG S J. Inner-IoU: more effective intersection over union loss with auxiliary bounding box [EB/OL]. (2023−11−14) [2024−11−20]. https://arxiv.org/abs/2311.02877.
18 TIAN Z, SHEN C, CHEN H, et al FCOS: a simple and strong anchor-free object detector[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (4): 1922- 1933
19 DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop. Seoul: IEEE, 2019: 213−226.
20 YU W, YANG T, CHEN C. Towards resolving the challenge of long-tail distribution in UAV images for object detection [C]// Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 3257−3266.
21 WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 7464−7475.
22 WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information [C]// European Conference on Computer Vision. Milan: Springer, 2025: 1−21.
23 SELVARAJU R R, COGSWELL M, DAS A, et al Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128 (2): 336- 359
doi: 10.1007/s11263-019-01228-7
24 XIA G S, BAI X, DING J, et al. DOTA: a large- scale dataset for object detection in aerial images [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3974−3983.
1 ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021: 2778−2788.
2 LUO X, WU Y, WANG F Target detection method of UAV aerial imagery based on improved YOLOv5[J]. Remote Sensing, 2022, 14 (19): 5063
doi: 10.3390/rs14195063
3 宋耀莲, 王粲, 李大焱, 等 基于改进YOLOv5s的无人机小目标检测算法[J]. 浙江大学学报: 工学版, 2024, 58 (12): 2417- 2426
SONG Yaolian, WANG Can, LI Dayan, et al UAV small target detection algorithm based on improved YOLOv5s[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (12): 2417- 2426
25 LI K, WAN G, CHENG G, et al Object detection in optical remote sensing images: a survey and a new benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 159: 296- 307
4 邓天民, 余洋, 陈月田, 等. 基于自适应复合卷积的航拍小目标检测算法[J/OL]. 北京航空航天大学学报, 2024: 1–14. (2024−06−19) [2024−11−19]. https://doi.org/10.13700/j.bh.100-5965.2024.0135.
DENG Tianmin, YU Yang, CHEN Yuetian, et al. Small object detection algorithm for aerial photography based on adaptive compound convolution [J/OL]. Journal of Beijing University of Aeronautics and Astronautics, 2024: 1–14. (2024−06−19) [2024−11−19]. https://doi.org/10.13700/j.bh.100-5965.2024.0135.
5 CAO J, BAO W, SHANG H, et al GCL-YOLO: a GhostConv-based lightweight YOLO network for UAV small object detection[J]. Remote Sensing, 2023, 15 (20): 4932
doi: 10.3390/rs15204932
6 WANG H, LIU C, CAI Y, et al YOLOv8-QSD: an improved small object detection algorithm for autonomous vehicles based on YOLOv8[J]. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 2513916
7 FENG F, HU Y, LI W, et al Improved YOLOv8 algorithms for small object detection in aerial imagery[J]. Journal of King Saud University-Computer and Information Sciences, 2024, 36 (6): 102113
doi: 10.1016/j.jksuci.2024.102113
8 BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS: improving object detection with one line of code [C]// Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5562−5570.
9 CHEN J, KAO SH, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 12021−12031.
10 YANG B, BENDER G, LE Q V, et al. CondConv: conditionally parameterized convolutions for efficient inference [C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: NeurIPS Foundation, 2020: 1296−1307.
11 CHEN Z, HE Z, LU Z M DEA-net: single image dehazing based on detail-enhanced convolution and content-guided attention[J]. IEEE Transactions on Image Processing, 2024, 33: 1002- 1015
doi: 10.1109/TIP.2024.3354108
12 LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936−944.
13 LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation [C]// Proceedings of the IEEE CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8759−8768.
14 SUNKARA R, LUO T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects [C]// Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Grenoble: Springer, 2023: 443−459.
15 CUI Y, REN W, KNOLL A Omni-kernel network for image restoration[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38 (2): 1426- 1434
doi: 10.1609/aaai.v38i2.27907
[1] 董超群,汪战,廖平,谢帅,荣玉杰,周靖淞. 轻量化YOLOv5s-OCG的轨枕裂纹检测算法[J]. 浙江大学学报(工学版), 2025, 59(9): 1838-1845.
[2] 程文鑫,闫光辉,常文文,吴佰靖,黄亚宁. 基于通道加权的多模态特征融合用于EEG疲劳驾驶检测[J]. 浙江大学学报(工学版), 2025, 59(9): 1775-1783.
[3] 周著国,鲁玉军,吕利叶. 基于改进YOLOv5s的印刷电路板缺陷检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1608-1616.
[4] 翟亚红,陈雅玲,徐龙艳,龚玉. 改进YOLOv8s的轻量级无人机航拍小目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1708-1717.
[5] 付家瑞,李兆飞,周豪,黄惟. 基于Convnextv2与纹理边缘引导的伪装目标检测[J]. 浙江大学学报(工学版), 2025, 59(8): 1718-1726.
[6] 魏新雨,饶蕾,范光宇,陈年生,程松林,杨定裕. 用于无人机遥感图像的高精度实时语义分割网络[J]. 浙江大学学报(工学版), 2025, 59(7): 1411-1420.
[7] 何婧瑶,李鹏飞,汪承志,吕振鸣,牟萍. 基于双目视觉和改进YOLOv8的动态三维重建方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1443-1450.
[8] 徐慧智,王秀青. 基于车辆图像特征的前车距离与速度感知[J]. 浙江大学学报(工学版), 2025, 59(6): 1219-1232.
[9] 曹铭,段武峰,马梦骁,艾凡荣,周奎. 基于改进YOLOv8-Seg模型的生物打印机产物均一性评估[J]. 浙江大学学报(工学版), 2025, 59(6): 1277-1283.
[10] 张自然,李锵,关欣. 基于卷积辅助自注意力的胸部疾病分类网络[J]. 浙江大学学报(工学版), 2025, 59(5): 890-901.
[11] 李沈崇,曾新华,林传渠. 基于轴向注意力的多任务自动驾驶环境感知算法[J]. 浙江大学学报(工学版), 2025, 59(4): 769-777.
[12] 张振利,胡新凯,李凡,冯志成,陈智超. 基于CNN和Efficient Transformer的多尺度遥感图像语义分割算法[J]. 浙江大学学报(工学版), 2025, 59(4): 778-786.
[13] 刘登峰,郭文静,陈世海. 基于内容引导注意力的车道线检测网络[J]. 浙江大学学报(工学版), 2025, 59(3): 451-459.
[14] 梁礼明,龙鹏威,金家新,李仁杰,曾璐. 基于改进YOLOv8s的钢材表面缺陷检测算法[J]. 浙江大学学报(工学版), 2025, 59(3): 512-522.
[15] 王浚银,文斌,沈艳军,张俊,王子豪. 基于改进YOLOv7-tiny的铝型材表面缺陷检测方法[J]. 浙江大学学报(工学版), 2025, 59(3): 523-534.