1. College of Electrical Engineering, Henan University of Technology, Zhengzhou 450001, China 2. School of Automation, Beijing Institute of Technology, Beijing 100081, China
A small target detection method for unmanned aerial vehicle (UAV) based on adaptive up-sampling and spatial correlation enhancement was proposed, to resolve the problem of false detection and missed detection caused by the small size of UAV and the difficulty of feature extraction under complex backgrounds. Firstly, the important contextual information was obtained by multi-scale dilated convolution, and then the attention feature fusion module was used to suppress the information conflict of multi-scale feature fusion; Secondly, a new up-sampling method of sub-pixel convolution and bilinear interpolation adaptive fusion was adopted to balance the computation and to fuse more UAV feature information; Finally, spatial correlation enhancement strategies for local and global spatial features were performed on deep features to improve the sensitivity of foreground targets in complex backgrounds and enhance target expression to suppress background noise. Ablation experiments and comparative experiments were implemented on the self-made UAV dataset. The mAP0.5 and mAP0.5:0.95 of the proposed algorithm were increased by 2.4% and 2.7% respectively, compared with those of the original YOLOv5 algorithm. Furthermore, the detection speed was able to achieve 58.5 frames per second. The performance of the proposed algorithm was also verified on the VisDrone2019 dataset, and its mAP0.5 and mAP0.5:0.95 were respectively higher than those of the YOLOv5 algorithm by 4.6% and 1.3%.
Fig.2Multi-scale contextual information and attentional feature fusion enhancement module
Fig.3Improved up-sampling module
方法
P/%
R/%
mAP0.5/%
mAP0.5∶0.95/%
GFLOPs
基线模型
52.5
34.4
34.1
16.4
21.0
拼接融合
51.8
35.1
34.8
17.1
26.0
加权融合
53.0
37.4
35.5
17.0
22.6
Tab.1Comparison experiments of feature fusion methods
Fig.4Spatial correlation enhancement
Fig.5Sample images from self-made UAV dataset
${A _{\mathrm{a}}} $
${\varphi _{\mathrm{n}}} $/%
(0,0.001]
94.87
(0.001,0.01]
5.08
(0.01,0.0122]
0.05
Tab.2Statistics of bounding boxes at different scales
Fig.6Distribution of UAV label size of self-made UAV dataset
Fig.7Distribution of labels of each category in train set and validation set of VisDrone 2019 dataset
方法
P/%
R/%
mAP0.5/%
mAP0.5∶0.95/%
GFLOPs
FPS/(帧·s?1)
YOLOv5s
92.9
70.5
73.7
29.8
16.3
122
+MCIAFFE
93.7
73.4
76.4
32.0
16.9
105.3
+新上采样
93.5
71.8
73.8
31.8
17.7
78.1
+ SCE
93.2
71.7
73.5
30.1
20.0
66.7
本研究算法
94.5
74.8
76.1
32.5
22.6
58.5
Tab.3Results of ablation experiments of self-made UAV dataset
Fig.8Change process of mAP0.5 during training
Fig.9Comparison of detection results between YOLOv5 and proposed algorithm
方法
输入尺寸
P/%
R/%
mAP0.5/%
mAP0.5∶0.95/%
SSD
300×300
—
—
48.6
—
Refinedet[30]
512×512
—
—
63.5
—
YOLOv4
416×412
77.6
66.0
64.4
18.8
YOLOv5
640×640
92.9
70.5
73.7
29.8
Edgeyolo[31]
640×640
—
—
63.1
24.5
ScaledYOLOv4[32]
640×640
93.0
73.2
72.5
30.7
MDSSD[33]
300×300
—
—
59.3
—
SuperYOLO[34]
640×640
88.7
71.9
75.5
31.6
本研究算法
640×640
94.5
74.8
76.1
32.5
Tab.4Experimental results of different detection methods on self-made UAV dataset
方法
P/%
R/%
mAP0.5/%
mAP0.5∶0.95/%
GFLOPs
YOLOv5
48.4
33.1
30.9
15.7
16.5
模型A
52.5
34.4
34.1
16.4
21.0
本研究算法
53.0
37.4
35.5
17.0
22.6
Tab.5Detection results on VisDrone2019 dataset
类别
mAP0.5
$\varDelta $
YOLOv5
本研究算法
All
30.9
35.5
4.6
car
69.9
76.7
6.8
pedestrian
34.2
40.4
6.2
motor
35.2
39.8
4.6
person
27.7
31.8
4.8
van
34.2
38.6
4.4
truck
27.8
28.3
0.5
bicycle
9.3
11.0
1.7
bus
42.5
49.7
7.2
tricycle
18.8
21.6
2.8
awning
9.1
10.6
1.5
Tab.6mAP0.5 of each category on VisDrone2019 dataset
Fig.10Detection results on challenge testset
[1]
BALESTRIERI E, DAPONTE P, DE VITO L, et al Sensors and measurements for UAV safety: an overview[J]. Sensors, 2021, 21 (24): 8253
doi: 10.3390/s21248253
[2]
LIU Y, SUN P, WERGELES N, et al A survey and performance evaluation of deep learning methods for small object detection[J]. Expert Systems with Applications, 2021, 172: 114602
doi: 10.1016/j.eswa.2021.114602
[3]
KOUSHIK J. Understanding convolutional neural networks [EB/OL]. (2016-05-30). https://arxiv.org/abs/1605.09081.
[4]
韩俊, 袁小平, 王准, 等 基于YOLOv5s的无人机密集小目标检测算法[J]. 浙江大学学报:工学版, 2023, 57 (6): 1224- 1233 HAN Jun, YUAN Xiaoping, WANG Zhun, et al UAV dense small target detection algorithm based on YOLOv5s[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (6): 1224- 1233
[5]
GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the IEEE conference on Computer Vision and Pattern Recognition . Columbus: IEEE, 2014: 580−587.
[6]
GIRSHICK R. Fast R-CNN [C]// Proceedings of the IEEE International Conference on Computer Vision . Santiago: IEEE, 2015: 1440−1448.
[7]
REN S Q, HE K M, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2016, 39 (6): 1137- 1149
[8]
HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 2961–2969.
[9]
张艳, 孙晶雪, 孙叶美, 等 基于分割注意力与线性变换的轻量化目标检测[J]. 浙江大学学报:工学版, 2023, 57 (6): 1195- 1204 ZHANG Yan, SUN Jingxue, SUN Yemei, et al Lightweight object detection based on split attention and linear transformation[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (6): 1195- 1204
[10]
REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 779–788.
[11]
LIU W, ANGUELOY D, ERHAN D, et al. SSD: single shot multibox detector [C]// Proceedings of the 14th European Conference on Computer Vision . Berlin: ECCV, 2016: 21−37.
[12]
WANG Q, QIAN Y, HU Y, et al M2YOLOF: based on effective receptive fields and multiple-in-single-out encoder for object detection[J]. Expert Systems with Applications, 2023, 213: 118928
doi: 10.1016/j.eswa.2022.118928
[13]
PENG C, ZHU M, REN H G, et al Small object detection method based on weighted feature fusion and CSMA attention module[J]. Electronics, 2022, 11 (16): 2546
doi: 10.3390/electronics11162546
[14]
MIN K, LEE G H, LEE S W Attentional feature pyramid network for small object detection[J]. Neural Networks, 2022, 155: 439- 450
doi: 10.1016/j.neunet.2022.08.029
[15]
JU M R, LUO J N, ZHANG P P, et al A simple and efficient network for small target detection[J]. IEEE Access, 2019, 7: 85771- 85781
doi: 10.1109/ACCESS.2019.2924960
[16]
DENG C F, WANG M M, LIU L, et al Extended feature pyramid network for small object detection[J]. IEEE Transactions on Multimedia, 2021, 24: 1968- 1979
[17]
HE X W, CHENG R, ZHENG Z L, et al Small object detection in traffic scenes based on YOLO-MXANet[J]. Sensors, 2021, 21 (21): 7422
doi: 10.3390/s21217422
[18]
JI S J, LING Q H, HAN F An improved algorithm for small object detection based on YOLOv4 and multi-scale contextual information[J]. Computers and Electrical Engineering, 2023, 105: 108490
doi: 10.1016/j.compeleceng.2022.108490
[19]
张娜, 戚旭磊, 包晓安, 等 基于优化预测定位的单阶段目标检测算法[J]. 浙江大学学报:工学版, 2022, 56 (4): 783- 794 ZHANG Na, QI Xulei, BAO Xiaoan, et al Single-stage object detection algorithm based on optimizing position prediction[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (4): 783- 794
[20]
谢誉, 包梓群, 张娜, 等 基于特征优化与深层次融合的目标检测算法[J]. 浙江大学学报:工学版, 2022, 56 (12): 2403- 2415 XIE Yu, BAO Ziqun, ZHANG Na, et al Object detection algorithm based on feature enhancement and deep fusion[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (12): 2403- 2415
[21]
BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. (2020-04-23). https://arxiv.org/abs/2004.10934v1.
[22]
LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 8759-8768.
[23]
LIU H Y, SUN F Q, GU J, et al Sf-YOLOv5: a lightweight small object detection algorithm based on improved feature fusion mode[J]. Sensors, 2022, 22 (15): 5817
doi: 10.3390/s22155817
[24]
张云佐, 郭威, 蔡昭权, 等 联合多尺度与注意力机制的遥感图像目标检测[J]. 浙江大学学报:工学版, 2022, 56 (11): 2215- 2223 ZHANG Yunzuo, GUO Wei, CAI Zhaoquan, et al Remote sensing image target detection combining multi-scale and attention mechanism[J]. Journal of Zhejiang University:Engineering Science, 2022, 56 (11): 2215- 2223
[25]
KIM M, KIM H, SUNG J, et al High-resolution processing and sigmoid fusion modules for efficient detection of small objects in an embedded system[J]. Scientific Reports, 2023, 13 (1): 244
doi: 10.1038/s41598-022-27189-5
[26]
CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFS [EB/OL]. (2014-12-22). https://arxiv.org/abs/1412.7062.
[27]
ZHAN W, SUN C F, WANG M C, et al An improved YOLOv5 real-time detection method for small objects captured by UAV[J]. Soft Computing, 2022, 26: 361- 373
doi: 10.1007/s00500-021-06407-8
[28]
WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision . Munich: ECCV. 2018: 3−19.
[29]
DU D W, ZHU P F, WEN L Y, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops . Seoul: IEEE, 2019.
[30]
ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 4203−4212.
[31]
LIU S H, ZHA J L, SUN J, et al. EdgeYOLO: an edge-real-time object detector [EB/OL]. [2023-02-15]. https://arxiv.org/abs/2302.07483.
[32]
WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Scaled-YOLOv4: scaling cross stage partial network [C]// Proceedings of the IEEE/cvf Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 13029−13038.
[33]
CUI L S, MA R, LV P, et al MDSSD: multi-scale deconvolutional single shot detector for small objects[J]. Science China Information Sciences, 2020, 63: 120113
doi: 10.1007/s11432-019-2723-1