UAV small target detection algorithm based on improved YOLOv5s

doi:10.3785/j.issn.1008-973X.2024.12.001

Journal of ZheJiang University (Engineering Science)

2024, Vol. 58

Issue (12): 2417-2426 DOI: 10.3785/j.issn.1008-973X.2024.12.001

UAV small target detection algorithm based on improved YOLOv5s

Yaolian SONG(

),Can WANG,Dayan LI*(

),Xinyi LIU

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

Download:

HTML

PDF(708KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

An unmanned aerial vehicle (UAV) small target detection algorithm based on YOLOv5, termed FDB-YOLO, was proposed to address the significant issue of misidentification and omissions in traditional target detection algorithms when applied to UAV aerial photography of small targets. Initially, a small target detection layer was added on the basis of YOLOv5, and the feature fusion network was optimized to fully leverage the fine-grained information of small targets in shallow layers, thereby enhancing the network’s perceptual capabilities. Subsequently, a novel loss function, FPIoU, was introduced, which capitalized on the geometric properties of anchor boxes and utilized a four-point positional bias constraint function to optimize the anchor box positioning and accelerate the convergence speed of the loss function. Furthermore, a dynamic target detection head (DyHead) incorporating attention mechanism was employed to enhance the algorithm’s detection capabilities through increased awareness of scale, space, and task. Finally, a bi-level routing attention mechanism (BRA) was integrated into the feature extraction phase, selectively computing relevant areas to filter out irrelevant regions, thereby improving the model’s detection accuracy. Experimental validation conducted on the VisDrone2019 dataset demonstrated that the proposed algorithm outperformed the YOLOv5s baseline in terms of Precision by an increase of 3.7 percentage points, Recall by an increase of 5.1 percentage points, mAP₅₀ by an increase of 5.8 percentage points, and mAP_50:95 by an increase of 3.4 percentage points, showcasing superior performance compared to current mainstream algorithms.

Key words： unmanned aerial vehicle perspective small object detection layer loss function attention mechanism YOLOv5

Received: 09 January 2024 Published: 25 November 2024

CLC:

TP 391.4

Fund: 国家自然科学基金资助项目(61962032); 云南省优秀青年基金资助项目(202001AW070003); 云南省基础研究计划面上资助项目(202301AT070452).

Corresponding Authors: Dayan LI E-mail: 39217149@qq.com;lidayan@kust.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Yaolian SONG
	Can WANG
	Dayan LI
	Xinyi LIU

Cite this article:

Yaolian SONG,Can WANG,Dayan LI,Xinyi LIU. UAV small target detection algorithm based on improved YOLOv5s. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2417-2426.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.12.001 OR https://www.zjujournals.com/eng/Y2024/V58/I12/2417

基于改进YOLOv5s的无人机小目标检测算法

为了解决传统目标检测算法对无人机(UAV)航拍小目标存在错漏检严重的问题，提出基于YOLOv5的无人机小目标检测算法FDB-YOLO. 在YOLOv5的基础上增加小目标检测层，优化特征融合网络，充分利用网络浅层小目标细粒信息，提升网络感知能力；提出损失函数FPIoU，通过充分利用锚框的几何性质，采用四点位置偏置约束函数，优化锚框定位，加快损失函数收敛速度；采用结合注意力机制的动态目标检测头(DyHead)，通过增加尺度、空间、任务感知提升算法检测能力；在特征提取部分引入双级路由注意力机制(BRA)，通过有选择性地对相关区域进行计算，过滤无关区域，提升模型的检测精确度. 实验证明，在VisDrone2019数据集上，本算法与YOLOv5s目标检测算法相比，精确率提升了3.7个百分点，召回率提升了5.1个百分点，mAP₅₀增加了5.8个百分点，mAP_50∶95增加3.4个百分点，并且相比当前主流算法而言都有更加优秀的表现.

关键词： 无人机视角, 小目标检测层, 损失函数, 注意力机制, YOLOv5

Fig.1 Architecture of YOLOv5

Fig.2 Architecture of FDB-YOLO

Fig.3 Factors of FPIoU loss function

Fig.4 Structure of Dynamic Head

Fig.5 Bi-level routing attention structure diagram

Fig.6 Instance distribution of train dataset

Fig.7 Object size distribution diagram

Tab.1 Model training parameter setting

Tab.2 Analysis of ablation experimental results with different combinations of improved points

Tab.3 Detection accuracy and speed of different algorithms on Visdrone2019 data set

Fig.8 Effect comparison of different loss functions

Fig.9 Visual comparison of detection effect before and after model improvement

Tab.4 Comparison of mAP₅₀ effect before and after model improvement


[1]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Columbus: IEEE, 2014: 580–587.

[2]	REN S , HE K , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017, 39(6): 1137–1149.

[3]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 779–788.

[4]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 2999–3007.

[5]	ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 2778–2788.

[6]	胡俊, 顾晶晶, 王秋红基于遥感图像的多模态小目标检测[J]. 图学学报, 2022, 43 (2): 197- 204 HU Jun, GU Jingjing, WANG Qiuhong Multimodal small target detection based on remote sensing image[J]. Journal of Graphics, 2022, 43 (2): 197- 204

[7]	韩俊, 袁小平, 王准, 等基于YOLOv5s的无人机密集小目标检测算法[J]. 浙江大学学报: 工学版, 2023, 57 (6): 1224- 1233 HAN Jun, YUAN Xiaoping, WANG Zhun, et al UAV dense small target detection algorithm based on YOLOv5s[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (6): 1224- 1233

[8]	DAI X, CHEN Y, XIAO B, et al. Dynamic head: unifying object detection heads with attentions [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 7369–7378.

[9]	ZHU L, WANG X, KE Z, et al. BiFormer: vision transformer with bi-level routing attention [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver: IEEE, 2023: 10323–10333.

[10]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 6517–6525.

[11]	REDMON J, FARHADI A. Yolov3: an incremental improvement [EB/OL]. (2018-04-08)[2023-11-20]. https://arxiv.org/abs/1804.02767.

[12]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection [EB/OL]. (2020-04-23)[2023-11-20]. https://arxiv.org/abs/2004.10934.

[13]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 936–944.

[14]	LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 8759–8768.

[15]	ZHENG Z, WANG P, REN D, et al Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2021, 52 (8): 8574- 8586

[16]	WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision . Munich: Springer, 2018: 3–19.

[17]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 7132–7141.

[18]	HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 13708–13717.

[19]	WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 11531–11539.

[20]	DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops . Seoul: IEEE, 2019: 213–226.

[21]	LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications [EB/OL]. (2022-09-07)[2023-11-20]. https://arxiv.org/abs/2209.02976.

[22]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver: IEEE, 2023: 7464–7475.

[23]	REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 658–666.

[24]	ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2020, 34(7): 12993–13000.

[25]	GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression [EB/OL]. (2022-05-25)[2023-11-20]. https://arxiv.org/abs/2205.12740.

[1]	Qingdong RAN,Lixin ZHENG. Defect detection method of lithium battery electrode based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1811-1821.

[2]	Lingjia ZHANG,Xinlei ZHOU,Yueping XU,Yenming CHIANG. Analysis of inundation from social media based on integrated YOLOv5 and Mask-RCNN model[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1822-1831.

[3]	Canlin LI,Xinyue WANG,Lizhuang MA,Zhiwen SHAO,Wenjiao ZHANG. Image cartoonization incorporating attention mechanism and structural line extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1728-1737.

[4]	Zhongliang LI,Qi CHEN,Lin SHI,Chao YANG,Xianming ZOU. Dynamic knowledge graph completion of temporal aware combination[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1738-1747.

[5]	Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.

[6]	Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.

[7]	Jun YANG,Chen ZHANG. Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1121-1132.

[8]	Yuntang LI,Hengjie LI,Kun ZHANG,Binrui WANG,Shanyue GUAN,Yuan CHEN. Recognition of complex power lines based on novel encoder-decoder network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1133-1141.

[9]	Zhiwei XING,Shujie ZHU,Biao LI. Airline baggage feature perception based on improved graph convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 941-950.

[10]	Yi LIU,Yidan CHEN,Lin GAO,Jiao HONG. Lightweight road extraction model based on multi-scale feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 951-959.

[11]	Cuiting WEI,Weijian ZHAO,Bochao SUN,Yunyi LIU. Intelligent rebar inspection based on improved Mask R-CNN and stereo vision[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 1009-1019.

[12]	Hai HUAN,Yu SHENG,Chenxi GU. Global guidance multi-feature fusion network based on remote sensing image road extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 696-707.

[13]	Mingjun SONG,Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU. Light-weight algorithm for real-time robotic grasp detection[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 599-610.

[14]	Zhaoyi JIANG,Wenqin ZOU,Shenghao ZHENG,Chao SONG,Bailin YANG. Variable rate compression of point cloud based on scene flow[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 279-287.

[15]	Xinhua YAO,Tao YU,Senwen FENG,Zijian MA,Congcong LUAN,Hongyao SHEN. Recognition method of parts machining features based on graph neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 349-359.

Viewed

Full text

Abstract

Cited

Shared

Discussed