Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (12): 2417-2426    DOI: 10.3785/j.issn.1008-973X.2024.12.001
计算机技术     
基于改进YOLOv5s的无人机小目标检测算法
宋耀莲(),王粲,李大焱*(),刘欣怡
昆明理工大学 信息工程与自动化学院,云南 昆明 650500
UAV small target detection algorithm based on improved YOLOv5s
Yaolian SONG(),Can WANG,Dayan LI*(),Xinyi LIU
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
 全文: PDF(708 KB)   HTML
摘要:

为了解决传统目标检测算法对无人机(UAV)航拍小目标存在错漏检严重的问题,提出基于YOLOv5的无人机小目标检测算法FDB-YOLO. 在YOLOv5的基础上增加小目标检测层,优化特征融合网络,充分利用网络浅层小目标细粒信息,提升网络感知能力;提出损失函数FPIoU,通过充分利用锚框的几何性质,采用四点位置偏置约束函数,优化锚框定位,加快损失函数收敛速度;采用结合注意力机制的动态目标检测头(DyHead),通过增加尺度、空间、任务感知提升算法检测能力;在特征提取部分引入双级路由注意力机制(BRA),通过有选择性地对相关区域进行计算,过滤无关区域,提升模型的检测精确度. 实验证明,在VisDrone2019数据集上,本算法与YOLOv5s目标检测算法相比,精确率提升了3.7个百分点,召回率提升了5.1个百分点,mAP50增加了5.8个百分点,mAP50∶95增加3.4个百分点,并且相比当前主流算法而言都有更加优秀的表现.

关键词: 无人机视角小目标检测层损失函数注意力机制YOLOv5    
Abstract:

An unmanned aerial vehicle (UAV) small target detection algorithm based on YOLOv5, termed FDB-YOLO, was proposed to address the significant issue of misidentification and omissions in traditional target detection algorithms when applied to UAV aerial photography of small targets. Initially, a small target detection layer was added on the basis of YOLOv5, and the feature fusion network was optimized to fully leverage the fine-grained information of small targets in shallow layers, thereby enhancing the network’s perceptual capabilities. Subsequently, a novel loss function, FPIoU, was introduced, which capitalized on the geometric properties of anchor boxes and utilized a four-point positional bias constraint function to optimize the anchor box positioning and accelerate the convergence speed of the loss function. Furthermore, a dynamic target detection head (DyHead) incorporating attention mechanism was employed to enhance the algorithm’s detection capabilities through increased awareness of scale, space, and task. Finally, a bi-level routing attention mechanism (BRA) was integrated into the feature extraction phase, selectively computing relevant areas to filter out irrelevant regions, thereby improving the model’s detection accuracy. Experimental validation conducted on the VisDrone2019 dataset demonstrated that the proposed algorithm outperformed the YOLOv5s baseline in terms of Precision by an increase of 3.7 percentage points, Recall by an increase of 5.1 percentage points, mAP50 by an increase of 5.8 percentage points, and mAP50:95 by an increase of 3.4 percentage points, showcasing superior performance compared to current mainstream algorithms.

Key words: unmanned aerial vehicle perspective    small object detection layer    loss function    attention mechanism    YOLOv5
收稿日期: 2024-01-09 出版日期: 2024-11-25
CLC:  TP 391.4  
基金资助: 国家自然科学基金资助项目(61962032); 云南省优秀青年基金资助项目(202001AW070003); 云南省基础研究计划面上资助项目(202301AT070452).
通讯作者: 李大焱     E-mail: 39217149@qq.com;lidayan@kust.edu.cn
作者简介: 宋耀莲(1977—)女,副教授,博士,从事人工智能与移动通信研究. orcid.org/0009-0007-7534-9644. E-mail:39217149@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
宋耀莲
王粲
李大焱
刘欣怡

引用本文:

宋耀莲,王粲,李大焱,刘欣怡. 基于改进YOLOv5s的无人机小目标检测算法[J]. 浙江大学学报(工学版), 2024, 58(12): 2417-2426.

Yaolian SONG,Can WANG,Dayan LI,Xinyi LIU. UAV small target detection algorithm based on improved YOLOv5s. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2417-2426.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.12.001        https://www.zjujournals.com/eng/CN/Y2024/V58/I12/2417

图 1  YOLOv5结构图
图 2  FDB-YOLO结构图
图 3  FPIoU损失函数参数
图 4  Dynamic Head结构
图 5  双级路由注意力机制结构图
图 6  训练集实例数量分布
图 7  目标尺寸大小分布图
参数数值参数数值
批量大小32余弦退火参数0.01
训练轮数150学习率动量0.937
图片尺寸640×640权重衰减系数0.0005
初始学习率0.01
表 1  模型训练参数设置
实验编号P2FPIoUDyHead[8]BRA[9]P/%R/%mAP50/%mAP50∶95/%Params/MGFLOPs
A46.534.134.118.87.03715.8
B48.737.837.820.97.44618.7
C46.234.934.619.07.04616.0
D45.234.033.718.37.33616.7
E45.433.933.217.78.10257.3
F46.034.734.318.58.39258.0
G49.837.737.921.17.18218.7
H48.839.139.021.67.48721.4
I48.837.737.920.98.24860.2
J49.539.539.221.77.47621.1
K49.639.239.321.88.54262.7
L50.239.239.922.28.53162.4
表 2  不同改进点组合的消融实验结果分析
模型mAP50/%mAP50:95/%FPS/帧
YOLOv3[11]20.412.231
YOLOv4[12]33.317.636
YOLOv5s34.118.8116
YOLOv6[21]33.818.448
YOLOv7[22]35.219.887
YOLOv839.422.182
Faster R-CNN[2]22.515.115
RetinaNet[4]30.118.5
本研究模型39.922.253
表 3  不同算法在Visdrone2019数据集上的检测精度与速度
图 8  不同损失函数效果对比
图 9  模型改进前后检测效果可视化对比
类别mAP50/%
YOLOv5sFDB-YOLO
pedestrian32.442.9
people24.931.8
bicycle9.8412.5
car59.271.0
van28.834.8
truck27.632.0
tricycle20.122.5
awning-tricycle8.2110.6
bus40.647.3
motor30.239.0
表 4  模型改进前后各类别mAP50效果对比
1 GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Columbus: IEEE, 2014: 580–587.
2 REN S , HE K , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017, 39(6): 1137–1149.
3 REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 779–788.
4 LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 2999–3007.
5 ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 2778–2788.
6 胡俊, 顾晶晶, 王秋红 基于遥感图像的多模态小目标检测[J]. 图学学报, 2022, 43 (2): 197- 204
HU Jun, GU Jingjing, WANG Qiuhong Multimodal small target detection based on remote sensing image[J]. Journal of Graphics, 2022, 43 (2): 197- 204
7 韩俊, 袁小平, 王准, 等 基于YOLOv5s的无人机密集小目标检测算法[J]. 浙江大学学报: 工学版, 2023, 57 (6): 1224- 1233
HAN Jun, YUAN Xiaoping, WANG Zhun, et al UAV dense small target detection algorithm based on YOLOv5s[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (6): 1224- 1233
8 DAI X, CHEN Y, XIAO B, et al. Dynamic head: unifying object detection heads with attentions [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 7369–7378.
9 ZHU L, WANG X, KE Z, et al. BiFormer: vision transformer with bi-level routing attention [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver: IEEE, 2023: 10323–10333.
10 REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 6517–6525.
11 REDMON J, FARHADI A. Yolov3: an incremental improvement [EB/OL]. (2018-04-08)[2023-11-20]. https://arxiv.org/abs/1804.02767.
12 BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection [EB/OL]. (2020-04-23)[2023-11-20]. https://arxiv.org/abs/2004.10934.
13 LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 936–944.
14 LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 8759–8768.
15 ZHENG Z, WANG P, REN D, et al Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2021, 52 (8): 8574- 8586
16 WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision . Munich: Springer, 2018: 3–19.
17 HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 7132–7141.
18 HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 13708–13717.
19 WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 11531–11539.
20 DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops . Seoul: IEEE, 2019: 213–226.
21 LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications [EB/OL]. (2022-09-07)[2023-11-20]. https://arxiv.org/abs/2209.02976.
22 WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver: IEEE, 2023: 7464–7475.
23 REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 658–666.
24 ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2020, 34(7): 12993–13000.
25 GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression [EB/OL]. (2022-05-25)[2023-11-20]. https://arxiv.org/abs/2205.12740.
[1] 冉庆东,郑力新. 基于改进YOLOv5的锂电池极片缺陷检测方法[J]. 浙江大学学报(工学版), 2024, 58(9): 1811-1821.
[2] 张凌嘉,周欣磊,许月萍,江衍铭. 基于YOLOv5和Mask-RCNN组合模型的社交媒体内涝灾害分析[J]. 浙江大学学报(工学版), 2024, 58(9): 1822-1831.
[3] 李灿林,王新玥,马利庄,邵志文,张文娇. 融合注意力机制和结构线提取的图像卡通化[J]. 浙江大学学报(工学版), 2024, 58(8): 1728-1737.
[4] 李忠良,陈麒,石琳,杨朝,邹先明. 时间感知组合的动态知识图谱补全[J]. 浙江大学学报(工学版), 2024, 58(8): 1738-1747.
[5] 吴书晗,王丹,陈远方,贾子钰,张越棋,许萌. 融合注意力的滤波器组双视图图卷积运动想象脑电分类[J]. 浙江大学学报(工学版), 2024, 58(7): 1326-1335.
[6] 马现伟,范朝辉,聂为之,李东,朱逸群. 对失效传感器具备鲁棒性的故障诊断方法[J]. 浙江大学学报(工学版), 2024, 58(7): 1488-1497.
[7] 杨军,张琛. 基于边界点估计与稀疏卷积神经网络的三维点云语义分割[J]. 浙江大学学报(工学版), 2024, 58(6): 1121-1132.
[8] 李运堂,李恒杰,张坤,王斌锐,关山越,陈源. 基于新型编码解码网络的复杂输电线识别[J]. 浙江大学学报(工学版), 2024, 58(6): 1133-1141.
[9] 邢志伟,朱书杰,李彪. 基于改进图卷积神经网络的航空行李特征感知[J]. 浙江大学学报(工学版), 2024, 58(5): 941-950.
[10] 刘毅,陈一丹,高琳,洪姣. 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报(工学版), 2024, 58(5): 951-959.
[11] 魏翠婷,赵唯坚,孙博超,刘芸怡. 基于改进Mask R-CNN与双目视觉的智能配筋检测[J]. 浙江大学学报(工学版), 2024, 58(5): 1009-1019.
[12] 宦海,盛宇,顾晨曦. 基于遥感图像道路提取的全局指导多特征融合网络[J]. 浙江大学学报(工学版), 2024, 58(4): 696-707.
[13] 宋明俊,严文,邓益昭,张俊然,涂海燕. 轻量化机器人抓取位姿实时检测算法[J]. 浙江大学学报(工学版), 2024, 58(3): 599-610.
[14] 江照意,邹文钦,郑晟豪,宋超,杨柏林. 基于场景流的可变速率动态点云压缩[J]. 浙江大学学报(工学版), 2024, 58(2): 279-287.
[15] 姚鑫骅,于涛,封森文,马梓健,栾丛丛,沈洪垚. 基于图神经网络的零件机加工特征识别方法[J]. 浙江大学学报(工学版), 2024, 58(2): 349-359.