Small target pedestrian detection based on adaptive proliferation data enhancement and global feature fusion
Qing-lin AI(),Jia-hao YANG,Jing-rui CUI
Key Laboratory of Special Purpose Equipment and Advanced Manufacturing Technology, Ministry of Education and Zhejiang Province, Zhejiang University of Technology, Hangzhou 310023, China
A global context feature fusion method for small target pedestrian detection was proposed based on the property of vanishing points and adaptive data augmentation to address the issues of limited small-scale pedestrian datasets and poor detection performance of traditional pedestrian detection models. Multiple targets in the image were copied by using the properties of projective geometry and vanishing points. The targets were projected to new locations through Affine transformation. Multiple small target samples with reasonable size and background were generated to complete data enhancement. The cross stage local network and lightweight operation were used to improve the hourglass structure, and the coordinate attention mechanism was integrated to strengthen the backbone network. The global feature fusion neck network (GFF-neck) was designed to fuse the global features. The experimental results showed that the improved algorithm achieved a detection AP value of 79.6% for pedestrian categories on the data enhanced WiderPerson dataset, and an mAP value of 80.2% on the VOC dataset. An experimental test system was built to test the real scene. The test results show that the proposed algorithm effectively improves the accuracy of small target pedestrian detection and recognition and meets the real-time requirements of the test.
Qing-lin AI,Jia-hao YANG,Jing-rui CUI. Small target pedestrian detection based on adaptive proliferation data enhancement and global feature fusion. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1933-1944.
Fig.1Enhancement methods of traditional replication
Fig.2Effect after adding dimensions
Fig.3Vertical vanishing point A and horizontal vanishing line BC in the three vanishing points model
Fig.4Detection result of vanishing point
Fig.5Schematic diagram of projection of space target on plane
Fig.6Thermodynamic diagram of target mapping coordinate probability
Fig.7Rendering of data augmentation for small target pedestrians
Fig.8Coordinate attention mechanism structure
Fig.9T-Sandglass structure based on cross stage local network
Fig.10Global feature fusion neck
Fig.11Overall network model structure
Fig.12Some samples of WiderPerson dataset
网络
Np/106
Flops/109
v/(帧·s?1)
AP/%
VGG
26.35
31.44
72.2
74.25
MobileNet-V2
3.43
0.72
378.1
69.03
MobileNeXt
3.48
0.76
360.3
70.55
MobileNeXt+CA
3.82
0.76
326.5
70.63
MobileNeXt+T-Sandglass
3.46
0.73
369.4
70.92
MobileNeXt+CA+T-Sandglass
3.80
0.73
332.4
71.46
Tab.1Performance of each backbone network when input size is 320
网络
Np/106
Flops/109
v/(帧·s?1)
AP/%
VGG512
27.19
90.39
44.3
77.93
MobileNet-V2
3.43
1.85
203.2
75.02
MobileNeXt
3.48
1.93
145.8
74.89
MobileNeXt+CA
3.82
1.94
142.6
75.13
MobileNeXt+T-Sandglass
3.46
1.90
177.8
75.52
MobileNeXt+CA+T-Sandglass
3.80
1.91
161.6
76.03
Tab.2Performance of each backbone network when input size is 512
骨干网络
颈部网络
Np/106
Flops/109
v/(帧·s?1)
AP/%
ShuffleNet-V2
SSD-neck
1.70
0.71
123.5
68.21
GFF-neck
1.44
1.32
100.6
74.62
MobileNet-V2
SSD-neck
3.43
0.76
378.1
69.03
GFF-neck
3.04
2.95
151.0
76.31
MobileNeXt
SSD-neck
3.48
0.76
360.3
70.55
GFF-neck
3.14
3.14
138.5
77.28
Tab.3Performance of two bottleneck structures in different backbone networks
Fig.13Comparison of detection effect between classical network and improved network
骨干网络
颈部网络
输入大小
Np/106
v/(帧·s?1)
AP/%
VGG
SSD-neck
300×300
26.35
72.2
74.25
MobileNetV2
YOLOv3
320×320
22.02
140.3
74.07
MobileNetV2
SSD-neck
320×320
3.43
378.1
69.03
MobileNeXt+
GFF-neck
320×320
3.18
128.6
78.05
Tab.4Detection effect of classical network and improved network
骨干网络
颈部网络
输入大小
Np/106
v/(帧·s?1)
mAP/%
VGG
SSD-neck
300×300
26.35
72.2
76.82
MobileNetV2
YOLOv3
320×320
22.02
140.3
76.13
MobileNetV2
SSD-neck
320×320
3.43
378.1
71.64
MobileNeXt+
GFF-neck
320×320
3.18
128.6
80.28
Tab.5Detection results of different networks in VOC dataset
输入大小
数据增强
AP/%
MobileNetV2-SSD
MobileNeXt+-GGF
320×320
未使用复制
69.03
78.05
320×320
随机复制
69.78
78.81
320×320
自适应增殖
70.25
79.61
512×512
未使用复制
75.02
81.86
512×512
随机复制
76.05
83.32
512×512
自适应增殖
76.89
84.34
Tab.6Effect of data enhancement of small target pedestrians on improving recognition accuracy
数据集
数据增强
AP/%
CityPersons
未使用复制
45.04
随机复制
46.61
自适应增殖
48.43
Caltech
未使用复制
68.34
随机复制
69.52
自适应增殖
71.13
Tab.7Data enhanced performance on CityPersons and CalTech
Fig.14Experimental platform and testing of detection effect
Fig.15Detection effect of pedestrian under actual environment
网络模型
AP/%
MobileNetV2-SSD
81.13
MobileNetV2-YOLOv3
85.53
MobileNeXt+-GGF
88.26
MobileNeXt+-GGF(自适应数据增强)
90.07
Tab.8Detection accuracy under actual environment
[1]
张娜, 戚旭磊, 包晓安, 等 基于优化预测定位的单阶段目标检测算法[J]. 浙江大学学报: 工学版, 2022, 56 (4): 783- 794 ZHANG Na, QI Xu-lei, BAO Xiao-an, et al Single-stage object detection algorithm based on optimizing position prediction[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (4): 783- 794
[2]
鞠默然, 罗海波, 王仲博, 等 改进的YOLOV3算法及其在小目标检测中的应用[J]. 光学学报, 2019, 39 (7): 0715004 JU Mo-ran, LUO Hai-bo, WANG Zhong-bo, et al Improved YOLOV3 algorithm and its application in small target detection[J]. Acta Optica Sinica, 2019, 39 (7): 0715004
doi: 10.3788/AOS201939.0715004
[3]
BELL S, ZITNICK C L, BALA K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 2874-2883.
[4]
KONG T, YAO A, CHEN Y, et al. Hypernet: towards accurate region proposal generation and joint object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 845-853.
[5]
FAN D, LIU D, CHI W, et al. Improved SSD-based multi-scale pedestrian detection algorithm [C]// Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology. Singapore: Springer, 2020: 109-118.
[6]
潘昕晖, 邵清, 卢军国 基于CBD-YOLOv3的小目标检测算法[J]. 小型微型计算机系统, 2022, 43 (10): 2143- 2149 PAN Xi-hui, SHAO Qing, LU Jun-guo Small object detection algorithm based on CBD-YOLOv3[J]. Journal of Chinese Computer Systems, 2022, 43 (10): 2143- 2149
doi: 10.20009/j.cnki.21-1106/TP.2021-0183
[7]
KISANTAL M, WOJNA Z, MURAWSKI J, et al. Augmentation for small object detection [EB/OL]. [2019-02-19]. https://arxiv.org/pdf/1902.07296.pdf.
[8]
LIN T, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2117-2125.
[9]
TAN M, PANG R, LE Q. Efficientdet: scalable and efficient object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10781-10790.
[10]
QIAO S, CHEN L, YUILLE A. Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 10213-10224.
[11]
汝承印, 张仕海, 张子淼, 等 基于轻量级MobileNet-SSD和MobileNetV2-DeeplabV3+的绝缘子故障识别方法[J]. 高电压技术, 2022, 48 (9): 3670- 3679 RU Cheng-yin, ZHANG Shi-hai, ZHANG Zi-miao, et al Fault identification method for high voltage power grid insulator based on lightweight mobileNet-SSD and mobileNetV2-DeeplabV3+ network[J]. High Voltage Engineering, 2022, 48 (9): 3670- 3679
[12]
SANDLER M, HOWARD A, ZHU M, et al. MobileNet V2: inverted residuals and linear bottlenecks [C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. : IEEE, 2018: 4510-4520.
[13]
ZHOU D, HOU Q, CHEN Y, et al. Rethinking bottleneck structure for efficient mobile network design [C]// European Conference on Computer Vision. Cham: Springer, 2020: 680-697.
[14]
YE K, FANG Z, HUANG X, et al. Research on small target detection algorithm based on improved YOLOv3 [C]// 5th International Conference on Mechanical, Control and Computer Engineering. Harbin: IEEE, 2020: 1467-1470.
[15]
SONG J, SONG H, WANG S PTZ camera calibration based on improved DLT transformation model and vanishing point constraints[J]. Optik-International Journal for Light and Electron Optics, 2021, 225 (7): 165875
[16]
LU X, YAO J, LI H, et al. 2-line exhaustive searching for real-time vanishing point estimation in manhattan world [C]// IEEE Winter Conference on Applications of Computer Vision. Santa Rosa: IEEE, 2017: 345-353.
[17]
HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[18]
HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13713-13722.
[19]
WANG C, LIAO H, WU Y, et al. CSPNet: a new backbone that can enhance learning capability of CNN [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 390-391.
[20]
董红召, 方浩杰, 张楠 旋转框定位的多尺度再生物品目标检测算法[J]. 浙江大学学报: 工学版, 2022, 56 (1): 16- 25 DONG Hong-zhao, FANG Hao-jie, ZHANG Nan Multi-scale object detection algorithm for recycled objects based on rotating block positioning[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (1): 16- 25