Please wait a minute...
浙江大学学报(工学版)  2019, Vol. 53 Issue (6): 1218-1224    DOI: 10.3785/j.issn.1008-973X.2019.06.022
计算机与自动化技术     
特征图聚集多尺度行人检测高效算法
陈昀(),蔡晓东*(),梁晓曦,王萌
桂林电子科技大学 信息与通信学院,广西 桂林,541004
Efficient multi-scale pedestrian detection algorithm withfeature map aggregation
Yun CHEN(),Xiao-dong CAI*(),Xiao-xi LIANG,Meng WANG
School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
 全文: PDF(944 KB)   HTML
摘要:

针对使用人工设计特征训练的行人检测算法准确率和效率较低的问题,提出一种采用卷积神经网络特征图聚集多尺度行人检测高效算法. 设计一种特征图聚集网络,将高层次特征图与低层次特征图进行聚集,构造出有较好空间分辨和语义能力的特征图;构造特征延伸网络,提供用于多尺度行人检测的特征图;重新设计目标候选区域,构造多尺度行人检测网络,提升定位准确性,并将特征图聚集网络、特征延伸网络和多尺度行人检测网络组合进行端到端训练. 实验测试结果表明,该算法可以有效提高行人检测与定位准确性,并可在普通硬件设备条件下提供实时检测.

关键词: 特征图聚集行人检测多尺度空间分辨语义能力    
Abstract:

An efficient multi-scale pedestrian detection algorithm with convolutional neural network feature map aggregation was proposed for the problems of low accuracy and efficiency in pedestrian detection algorithm trained by manual design feature. An aggregation network was designed to gather high-level and low-level feature maps to construct a feature map with the ability of spatial resolution and semantic. And an extension network was constructed to provide feature maps for multi-scale detection. In addition, candidate areas were redesigned to construct a multi-scale detection network to improve positioning accuracy. The feature map aggregation network, extension network and multi-scale pedestrian detection network were combined for an end-to-end training. The experimental results show that, compared to algorithms based on manual design features, the proposed algorithm can effectively improve the accuracy of pedestrian detection and positioning. Under common hardware conditions, the proposed approach can provide real-time detection.

Key words: feature map aggregation    pedestrian detection    multi-scale    spatial resolution    semantic ability
收稿日期: 2018-05-09 出版日期: 2019-05-22
CLC:  TP 391  
通讯作者: 蔡晓东     E-mail: 1655770801@qq.com;caixiaodong@guet.edu.cn
作者简介: 陈昀(1991—),男,硕士生,从事深度学习和图像处理研究. orcid.org/0000-0001-8438-5734. E-mail: 1655770801@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
陈昀
蔡晓东
梁晓曦
王萌

引用本文:

陈昀,蔡晓东,梁晓曦,王萌. 特征图聚集多尺度行人检测高效算法[J]. 浙江大学学报(工学版), 2019, 53(6): 1218-1224.

Yun CHEN,Xiao-dong CAI,Xiao-xi LIANG,Meng WANG. Efficient multi-scale pedestrian detection algorithm withfeature map aggregation. Journal of ZheJiang University (Engineering Science), 2019, 53(6): 1218-1224.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2019.06.022        http://www.zjujournals.com/eng/CN/Y2019/V53/I6/1218

图 1  基于视觉几何组(VGG)网络的低层与高层特征图聚集结构
图 2  基于多尺度卷积的特征延伸网络结构
图 3  基于多尺度特征图候选区域分类和边框回归的行人检测网络结构
图 4  INRIA行人数据集中的不同场景图片示例
图 5  ETH行人数据集中的不同场景图片示例
图 6  INRIA行人数据集中漏检率与每幅图像平均误检率变化曲线
算法 Mr1/% 算法 Mr1/%
LDCF[25] 13.8 SAF R-CNN[29] 8.0
SketchTokens[26] 13.3 YOLOv2[16] 13.0
Roerei[27] 13.5 SSD 14.9
RandForest[28] 15.4 本文算法 12.1
表 1  INRIA行人数据集中不同方法的检测准确性对比
图 7  ETH行人数据集中漏检率与每幅图像平均误检率变化曲线
J/% P/% Q/% J/% P/% Q/%
50 94.9 95.5 80 53.6 55.0
60 91.3 92.3 90 13.7 14.9
70 78.6 81.1 ? ? ?
表 2  INRIA行人数据集中本文算法与SSD算法的定位准确性
算法 Nper-sec 算法 Nper-sec
LDCF[25] 1.7 SAF R-CNN[29] 1.7
SketchTokens[26] 1.0 YOLOv2[16] 67
Roerei[27] 1.0 SSD 35
RandForest[28] 4.6 本文算法 33
表 3  不同算法每秒检测图片的数量对比
1 许言午, 曹先彬, 乔红 行人检测系统研究新进展及关键技术展望[J]. 电子学报, 2008, 36 (5): 962- 968
XU Yan-wu, CAO Xian-bin, QIAO hong Survey on the latest development of pedestrian detection system and its key technologies expectation[J]. Acta Electronica Sinica, 2008, 36 (5): 962- 968
doi: 10.3321/j.issn:0372-2112.2008.05.023
2 王素玉, 沈兰荪 智能视觉监控技术研究进展[J]. 中国图象图形学报, 2008, 36 (5): 962- 968
WANG Su-yu, SHEN Lan-sun intelligent visual surveillance technology: a survey[J]. Journal of Image and Graphics, 2008, 36 (5): 962- 968
3 乔传标, 王素玉, 卓力, 等 智能视觉监控中的目标检测与跟踪技术[J]. 测控技术, 2008, 27 (5): 22- 24
QIAO Chuan-biao, WANG Su-yu, ZHUO Li, et al Object detection and tracking for intelligent video surveillance[J]. Measurement and Control Technology, 2008, 27 (5): 22- 24
doi: 10.3969/j.issn.1000-8829.2008.05.007
4 PAPAGEORGIOU C, POGGIO T A trainable system for object detection[J]. International Journal of Computer Vision, 2000, 38 (1): 15- 33
doi: 10.1023/A:1008162616689
5 VIOLA P, JONES M J Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57 (2): 137- 154
doi: 10.1023/B:VISI.0000013087.49260.fb
6 VIOLA P, JONES M J, SNOW D Detecting pedestrians using patterns of motion and appearance[J]. International Journal of Computer Vision, 2005, 63 (2): 153- 161
doi: 10.1007/s11263-005-6644-8
7 VIOLA P, JONES M J, SNOW D. Detecting pedestrians using patterns of motion and appearance [C] // 9th IEEE International Conference on Computer Vision. Nice: IEEE, 2003: 734-741.
8 DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C] // Computer Vision and Pattern Recognition. San Diego: IEEE, 2005: 886-893.
9 WANG X Y, HAN T X, YAN S. An HOG-LBP human detector with partial occlusion handling [C] // 2009 IEEE 12th International Conference on Computer Vision. Kyoto: IEEE, 2009: 32-39.
10 FREUND Y, SCHAPIRE R E. Experiments with a new boosting algorithm [C] // International Conference on Machine Learning. Bari: ICML, 1996: 148-156.
11 GIRSHICK R B. Fast R-CNN [C] // IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1440-1448.
12 GIRSHICK R B, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C] // IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
13 UIJLINGS J R R, VANDESANDE K E A, GEVERS T, et al Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104 (2): 154- 171
doi: 10.1007/s11263-013-0620-5
14 REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C] // Neural Information Processing Systems. Montreal: NIPS, 2015: 91-99.
15 LIU W, ANGUELOV D, ERHAN D. SSD: single shot multibox detector [C] // European Conference on Computer Vision. Amsterdam: ECCV, 2016: 21-37.
16 JOSEPH R, ALI F. YOLO9000: better, faster, stronger [C] // IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525.
17 SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions [C] // IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1-9.
18 SZEGEDY C, IOFFE S, VANHOUCKE V. Inception-v4, Inception-ResNet and the impact of residual connections on learning [C] // Association for the Advancement of Artificial Intelligence. San Francisco: AAAI, 2017: 4278-4284
19 SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision [C] // IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE 2016: 2818-2826.
20 AHMED E, JONES M and MARKS T K. An improved deep learning architecture for person re-identification [C] // IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3908-3916.
21 KONG T, YAO A B, CHEN Y R, et al. HyperNet: towards accurate region proposal generation and joint object detection [C] // IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 845-853.
22 IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [C] // International Conference on Machine Learning. Lille: ICML, 2015: 448-456.
23 LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C] // IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
24 ESS A, LEIBE B, VAN GOOL L. Depth and appearance for mobile scene analysis [C] // IEEE International Conference on Computer Vision. Rio de Janeiro: IEEE, 2007: 1-8.
25 NAM W, DOLLAR P, HAN J H. Local decorrelation for improved pedestrian detection [C] // Neural Information Processing Systems. Montreal: NIPS, 2014: 424-432.
26 LIM J, ZITNICK C, DOLLAR P. Sketch tokens: a learned mid-level representation for contour and object detection [C] // IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 3158-3165.
27 BENENSON R, MATHIAS M, TUYTELAARS T. Seeking the strongest rigid detector [C] // IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 3666-3673.
28 MARIN J, VAZQUEZ D, LOPEZ A, et al. Random forests of local experts for pedestrian detection [C] // IEEE International Conference on Computer Vision. Sydney: IEEE, 2013: 2592-2599.
[1] 余文韬,谢旭,成程. 焊接构造对T型接头超低周疲劳性能的影响[J]. 浙江大学学报(工学版), 2021, 55(1): 31-37.
[2] 陈巧红,陈翊,李文书,贾宇波. 多尺度SE-Xception服装图像分类[J]. 浙江大学学报(工学版), 2020, 54(9): 1727-1735.
[3] 明涛,王丹,郭继昌,李锵. 基于多尺度通道重校准的乳腺癌病理图像分类[J]. 浙江大学学报(工学版), 2020, 54(7): 1289-1297.
[4] 孔庆盼,纪献兵,周儒鸿,尤天伢,徐进良. 亲-疏水两层结构表面强化蒸汽冷凝传热[J]. 浙江大学学报(工学版), 2020, 54(5): 1022-1028.
[5] 张淑芳,朱彤. 基于残差单发多框检测器模型的交通标志检测与识别[J]. 浙江大学学报(工学版), 2019, 53(5): 940-949.
[6] 林志洁,罗壮,赵磊,鲁东明. 特征金字塔多尺度全卷积目标检测算法[J]. 浙江大学学报(工学版), 2019, 53(3): 533-540.
[7] 胡天中,余建波. 基于多尺度分解和深度学习的锂电池寿命预测[J]. 浙江大学学报(工学版), 2019, 53(10): 1852-1864.
[8] 刘大龙, 冯冬芹. 采用多尺度主成分分析的控制系统欺骗攻击检测[J]. 浙江大学学报(工学版), 2018, 52(9): 1738-1746.
[9] 童水光, 张依东, 徐剑, 从飞云. 频带多尺度复合模糊熵及其在轴承故障诊断中的应用[J]. 浙江大学学报(工学版), 2018, 52(8): 1509-1516.
[10] 方钊, 李爱群, 李万润, 沈圣. 钢结构风致疲劳分析的多尺度有限元验证分析[J]. 浙江大学学报(工学版), 2018, 52(6): 1131-1139.
[11] 杨冰,王小华,杨鑫,黄孝喜. 基于HOG金字塔人脸识别方法[J]. 浙江大学学报(工学版), 2014, 48(9): 1564-1569.
[12] 牛辉,汪劲丰,张仪萍,张治成,俞亚南. 空间曲线蝶形拱桥顶推施工的多尺度模拟分析[J]. J4, 2013, 47(7): 1205-1212.
[13] 崔何亮, 张丹, 施斌.  布里渊分布式传感的空间分辨率及标定方法[J]. J4, 2013, 47(7): 1232-1237.
[14] 彭海,赵巨峰,冯华君,徐之海,李奇,陈跃庭. 基于区域显著性的双波段图像融合方法[J]. J4, 2012, 46(11): 2109-2115.
[15] 华岗, 陶煜波, 林海. 鲁棒多尺度地震层位识别与可视化[J]. J4, 2011, 45(10): 1697-1703.