Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (10): 1933-1944    DOI: 10.3785/j.issn.1008-973X.2023.10.003
计算机技术、自动化技术     
基于自适应增殖数据增强与全局特征融合的小目标行人检测
艾青林(),杨佳豪,崔景瑞
浙江工业大学 特种装备制造与先进加工技术教育部/浙江省重点实验室,浙江 杭州 310023
Small target pedestrian detection based on adaptive proliferation data enhancement and global feature fusion
Qing-lin AI(),Jia-hao YANG,Jing-rui CUI
Key Laboratory of Special Purpose Equipment and Advanced Manufacturing Technology, Ministry of Education and Zhejiang Province, Zhejiang University of Technology, Hangzhou 310023, China
 全文: PDF(3371 KB)   HTML
摘要:

针对当前规模的小目标行人数据集较少,传统行人检测模型对小目标检测效果较差的问题,提出一种基于消隐点性质,提出自适应增殖数据增强和全局上下文特征融合的小目标行人检测方法. 利用射影几何与消隐点的性质,对图像中的多个目标进行复制;通过仿射变换投影到新的位置,生成多个大小与背景合理的小目标样本以完成数据增强. 利用跨阶段局部网络与轻量化操作改进沙漏结构,融合坐标注意力机制强化骨干网络. 设计全局特征融合颈部网络(GFF-neck),以融合全局特征. 实验表明,在经过数据增强后的WiderPerson数据集上,改进算法对行人类别的检测AP值达到了79.6%,在VOC数据集上mAP值达到了80.2%. 测试结果表明,当搭建实验测试系统进行实景测试时,所提算法有效提升了小目标行人检测识别精度,并满足实时性要求.

关键词: 消隐点数据增强全局特征融合小目标行人检测轻量化沙漏结构    
Abstract:

A global context feature fusion method for small target pedestrian detection was proposed based on the property of vanishing points and adaptive data augmentation to address the issues of limited small-scale pedestrian datasets and poor detection performance of traditional pedestrian detection models. Multiple targets in the image were copied by using the properties of projective geometry and vanishing points. The targets were projected to new locations through Affine transformation. Multiple small target samples with reasonable size and background were generated to complete data enhancement. The cross stage local network and lightweight operation were used to improve the hourglass structure, and the coordinate attention mechanism was integrated to strengthen the backbone network. The global feature fusion neck network (GFF-neck) was designed to fuse the global features. The experimental results showed that the improved algorithm achieved a detection AP value of 79.6% for pedestrian categories on the data enhanced WiderPerson dataset, and an mAP value of 80.2% on the VOC dataset. An experimental test system was built to test the real scene. The test results show that the proposed algorithm effectively improves the accuracy of small target pedestrian detection and recognition and meets the real-time requirements of the test.

Key words: vanishing point    data enhancement    global feature fusion    small target pedestrian detection    lightweight hourglass structure
收稿日期: 2022-10-25 出版日期: 2023-10-18
CLC:  TP 399  
基金资助: 国家自然科学基金资助项目(52075488);浙江省自然科学基金资助项目(LY20E050023)
作者简介: 艾青林(1976—),男,教授,博士,从事机器视觉检测技术与智能机器人技术研究. orcid.org/0000-0002-9017-1916. E-mail: aqlaql@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
艾青林
杨佳豪
崔景瑞

引用本文:

艾青林,杨佳豪,崔景瑞. 基于自适应增殖数据增强与全局特征融合的小目标行人检测[J]. 浙江大学学报(工学版), 2023, 57(10): 1933-1944.

Qing-lin AI,Jia-hao YANG,Jing-rui CUI. Small target pedestrian detection based on adaptive proliferation data enhancement and global feature fusion. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1933-1944.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.10.003        https://www.zjujournals.com/eng/CN/Y2023/V57/I10/1933

图 1  传统复制的增强方法
图 2  添加标注后的效果
图 3  三消隐点模型中竖直方向消隐点A与水平方向消隐线BC
图 4  消隐点的检测结果
图 5  空间目标在平面上投影示意图
图 6  目标映射坐标概率的热力图
图 7  小目标行人数据增强的效果图
图 8  坐标注意力机制结构
图 9  基于跨阶段局部网络的沙漏结构T-Sandglass
图 10  全局特征融合颈部网络
图 11  整体网络模型结构
图 12  WiderPerson数据集部分样本
网络 Np/106 Flops/109 v/(帧·s?1 AP/%
VGG 26.35 31.44 72.2 74.25
MobileNet-V2 3.43 0.72 378.1 69.03
MobileNeXt 3.48 0.76 360.3 70.55
MobileNeXt+CA 3.82 0.76 326.5 70.63
MobileNeXt+T-Sandglass 3.46 0.73 369.4 70.92
MobileNeXt+CA+T-Sandglass 3.80 0.73 332.4 71.46
表 1  输入尺寸为320时不同骨干网络性能
网络 Np/106 Flops/109 v/(帧·s?1 AP/%
VGG512 27.19 90.39 44.3 77.93
MobileNet-V2 3.43 1.85 203.2 75.02
MobileNeXt 3.48 1.93 145.8 74.89
MobileNeXt+CA 3.82 1.94 142.6 75.13
MobileNeXt+T-Sandglass 3.46 1.90 177.8 75.52
MobileNeXt+CA+T-Sandglass 3.80 1.91 161.6 76.03
表 2  输入尺寸为512时不同骨干网络的性能
骨干网络 颈部网络 Np/106 Flops/109 v/(帧·s?1 AP/%
ShuffleNet-V2 SSD-neck 1.70 0.71 123.5 68.21
GFF-neck 1.44 1.32 100.6 74.62
MobileNet-V2 SSD-neck 3.43 0.76 378.1 69.03
GFF-neck 3.04 2.95 151.0 76.31
MobileNeXt SSD-neck 3.48 0.76 360.3 70.55
GFF-neck 3.14 3.14 138.5 77.28
表 3  2种瓶颈结构在不同骨干网络中的性能
图 13  经典网络与改进网络检测效果对比
骨干网络 颈部网络 输入大小 Np/106 v/(帧·s?1 AP/%
VGG SSD-neck 300×300 26.35 72.2 74.25
MobileNetV2 YOLOv3 320×320 22.02 140.3 74.07
MobileNetV2 SSD-neck 320×320 3.43 378.1 69.03
MobileNeXt+ GFF-neck 320×320 3.18 128.6 78.05
表 4  经典网络与改进网络的检测效果
骨干网络 颈部网络 输入大小 Np/106 v/(帧·s?1 mAP/%
VGG SSD-neck 300×300 26.35 72.2 76.82
MobileNetV2 YOLOv3 320×320 22.02 140.3 76.13
MobileNetV2 SSD-neck 320×320 3.43 378.1 71.64
MobileNeXt+ GFF-neck 320×320 3.18 128.6 80.28
表 5  VOC数据集中不同网络的检测结果
输入大小 数据增强 AP/%
MobileNetV2-SSD MobileNeXt+-GGF
320×320 未使用复制 69.03 78.05
320×320 随机复制 69.78 78.81
320×320 自适应增殖 70.25 79.61
512×512 未使用复制 75.02 81.86
512×512 随机复制 76.05 83.32
512×512 自适应增殖 76.89 84.34
表 6  小目标行人数据增强对识别精度的提升效果
数据集 数据增强 AP/%
CityPersons 未使用复制 45.04
随机复制 46.61
自适应增殖 48.43
Caltech 未使用复制 68.34
随机复制 69.52
自适应增殖 71.13
表 7  CityPersons及CalTech进行数据增强的效果
图 14  实际环境的实验平台及测试
图 15  实际环境下行人的检测效果
网络模型 AP/%
MobileNetV2-SSD 81.13
MobileNetV2-YOLOv3 85.53
MobileNeXt+-GGF 88.26
MobileNeXt+-GGF(自适应数据增强) 90.07
表 8  实际环境下的检测准确率
1 张娜, 戚旭磊, 包晓安, 等 基于优化预测定位的单阶段目标检测算法[J]. 浙江大学学报: 工学版, 2022, 56 (4): 783- 794
ZHANG Na, QI Xu-lei, BAO Xiao-an, et al Single-stage object detection algorithm based on optimizing position prediction[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (4): 783- 794
2 鞠默然, 罗海波, 王仲博, 等 改进的YOLOV3算法及其在小目标检测中的应用[J]. 光学学报, 2019, 39 (7): 0715004
JU Mo-ran, LUO Hai-bo, WANG Zhong-bo, et al Improved YOLOV3 algorithm and its application in small target detection[J]. Acta Optica Sinica, 2019, 39 (7): 0715004
doi: 10.3788/AOS201939.0715004
3 BELL S, ZITNICK C L, BALA K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 2874-2883.
4 KONG T, YAO A, CHEN Y, et al. Hypernet: towards accurate region proposal generation and joint object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 845-853.
5 FAN D, LIU D, CHI W, et al. Improved SSD-based multi-scale pedestrian detection algorithm [C]// Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology. Singapore: Springer, 2020: 109-118.
6 潘昕晖, 邵清, 卢军国 基于CBD-YOLOv3的小目标检测算法[J]. 小型微型计算机系统, 2022, 43 (10): 2143- 2149
PAN Xi-hui, SHAO Qing, LU Jun-guo Small object detection algorithm based on CBD-YOLOv3[J]. Journal of Chinese Computer Systems, 2022, 43 (10): 2143- 2149
doi: 10.20009/j.cnki.21-1106/TP.2021-0183
7 KISANTAL M, WOJNA Z, MURAWSKI J, et al. Augmentation for small object detection [EB/OL]. [2019-02-19]. https://arxiv.org/pdf/1902.07296.pdf.
8 LIN T, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2117-2125.
9 TAN M, PANG R, LE Q. Efficientdet: scalable and efficient object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10781-10790.
10 QIAO S, CHEN L, YUILLE A. Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 10213-10224.
11 汝承印, 张仕海, 张子淼, 等 基于轻量级MobileNet-SSD和MobileNetV2-DeeplabV3+的绝缘子故障识别方法[J]. 高电压技术, 2022, 48 (9): 3670- 3679
RU Cheng-yin, ZHANG Shi-hai, ZHANG Zi-miao, et al Fault identification method for high voltage power grid insulator based on lightweight mobileNet-SSD and mobileNetV2-DeeplabV3+ network[J]. High Voltage Engineering, 2022, 48 (9): 3670- 3679
12 SANDLER M, HOWARD A, ZHU M, et al. MobileNet V2: inverted residuals and linear bottlenecks [C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. : IEEE, 2018: 4510-4520.
13 ZHOU D, HOU Q, CHEN Y, et al. Rethinking bottleneck structure for efficient mobile network design [C]// European Conference on Computer Vision. Cham: Springer, 2020: 680-697.
14 YE K, FANG Z, HUANG X, et al. Research on small target detection algorithm based on improved YOLOv3 [C]// 5th International Conference on Mechanical, Control and Computer Engineering. Harbin: IEEE, 2020: 1467-1470.
15 SONG J, SONG H, WANG S PTZ camera calibration based on improved DLT transformation model and vanishing point constraints[J]. Optik-International Journal for Light and Electron Optics, 2021, 225 (7): 165875
16 LU X, YAO J, LI H, et al. 2-line exhaustive searching for real-time vanishing point estimation in manhattan world [C]// IEEE Winter Conference on Applications of Computer Vision. Santa Rosa: IEEE, 2017: 345-353.
17 HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
18 HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13713-13722.
19 WANG C, LIAO H, WU Y, et al. CSPNet: a new backbone that can enhance learning capability of CNN [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 390-391.
20 董红召, 方浩杰, 张楠 旋转框定位的多尺度再生物品目标检测算法[J]. 浙江大学学报: 工学版, 2022, 56 (1): 16- 25
DONG Hong-zhao, FANG Hao-jie, ZHANG Nan Multi-scale object detection algorithm for recycled objects based on rotating block positioning[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (1): 16- 25
[1] 金鑫,庄建军,徐子恒. 轻量化YOLOv5s网络车底危险物识别算法[J]. 浙江大学学报(工学版), 2023, 57(8): 1516-1526.
[2] 艾青林,崔景瑞,吕冰海,童桐. 基于损伤区域融合变换的轴承鼓形滚子表面损伤检测方法[J]. 浙江大学学报(工学版), 2023, 57(5): 1009-1020.
[3] 徐泽鑫,段立娟,王文健,恩擎. 基于上下文特征融合的代码漏洞检测方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2260-2270.
[4] 刘坤,文熙,黄闽茗,杨欣欣,毛经坤. 基于生成对抗网络的太阳能电池缺陷增强方法[J]. 浙江大学学报(工学版), 2020, 54(4): 684-693.