Please wait a minute...
浙江大学学报(工学版)  2019, Vol. 53 Issue (3): 533-540    DOI: 10.3785/j.issn.1008-973X.2019.03.014
计算机技术     
特征金字塔多尺度全卷积目标检测算法
林志洁1,2(),罗壮2,赵磊2,*(),鲁东明2
1. 浙江科技学院 信息与电子工程学院,浙江 杭州 310023
2. 浙江大学 计算机学院,浙江 杭州 310027
Multi-scale convolution target detection algorithm with feature pyramid
Zhi-jie LIN1,2(),Zhuang LUO2,Lei ZHAO2,*(),Dong-ming LU2
1. School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China
2. School of computer science, Zhejiang University, Hangzhou 310027, China
 全文: PDF(1173 KB)   HTML
摘要:

基于区域建议网络构建一种特征金字塔多尺度网络结构,并结合全卷积操作完成微小目标与类别无关目标的检测. 为了提升图像中微小目标的检测精度,构建基于侧链接融合的3层金字塔结构网络,充分利用语义级别比较低的图像卷积特征. 为了提高类别无关的图像目标检测鲁棒性,提出特定的非极大值抑制算法,在重叠目标过滤时消除冗余目标窗口,并对目标窗口进行位置精修. 在PASCAL VOC 2007、PASCAL VOC 2012以及古代绘画数据集上的实验结果表明:所提算法对于微小目标、多尺度目标检测及种类无关的目标检测的检测精度高于已有算法.

关键词: 图像目标检测图像特征金字塔多尺度全卷积微小目标检测类别无关目标检测    
Abstract:

A feature pyramid multi-scale network structure was constructed based on the region recommendation network, the small target and class-independent image target were detected by combining the full convolution operation. In order to improve the detection accuracy of small targets in images, a three-layer pyramid structure network based on side link fusion was constructed, which made full use of the convolution features of images with low semantic level. To improve the robustness of class-independent image target detection, a specific non-maximum suppression algorithm was proposed to eliminate redundant target windows in overlapping target filtering and to refine the location of the target windows. The experimental results on PASCAL VOC 2007, PASCAL VOC 2012 and ancient painting datasets show that the detection accuracy of the proposed algorithm for small targets, multi-scale targets and type-independent targets is higher than that of the existing algorithms.

Key words: image target detection    image feature pyramid    multi-scale full convolution    small targets detection    category-independent target detection
收稿日期: 2018-04-27 出版日期: 2019-03-04
CLC:  TP 391  
通讯作者: 赵磊     E-mail: bytelin@qq.com;cszhl@zju.edu.cn
作者简介: 林志洁(1980?),男,博士,从事深度学习、图像处理研究. orcid.org/0000-0002-8605-2834. E-mail: bytelin@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
林志洁
罗壮
赵磊
鲁东明

引用本文:

林志洁,罗壮,赵磊,鲁东明. 特征金字塔多尺度全卷积目标检测算法[J]. 浙江大学学报(工学版), 2019, 53(3): 533-540.

Zhi-jie LIN,Zhuang LUO,Lei ZHAO,Dong-ming LU. Multi-scale convolution target detection algorithm with feature pyramid. Journal of ZheJiang University (Engineering Science), 2019, 53(3): 533-540.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2019.03.014        http://www.zjujournals.com/eng/CN/Y2019/V53/I3/533

图 1  多尺度全卷积目标检测网络结构图
图 2  非极大值抑制算法流程图
图 3  包围盒位置精修模块
网络模型 数据集 Pave / %
多尺度全卷积网络 71.2
Faster-RCNN PASCAL VOC 2007 68.8
Mask-RCNN 69.36
多尺度全卷积网络 67.1
Faster-RCNN PASCAL VOC 2012 66.5
Mask-RCNN 66.9
多尺度全卷积网络 74.8
Faster-RCNN PASCAL VOC 2007 + 2012 72.5
Mask-RCNN 73.26
表 1  不同网络模型在标准数据集上的平均精度值比较
图 4  多尺度全卷积网络与Faster-RCNN在标准数据集上的召回率随交并比(IoU)变化折线图
图 5  多尺度全卷积网络模型与Faster-RCNN方法的微小目标检测效果对比
图 6  古代绘画图像数据集上的召回率随IoU的变化
图 7  多尺度全卷积网络模型与Fast-RCNN方法在古代绘画图像数据集上的目标检测效果对比
1 SERMANET P, EIGEN D, ZHANG X, et al. OverFeat: integrated recognition, localization and detection using convolutional networks [EB/OL]. preprint arXiv: 1312.6229.
2 REN S, HE K, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137- 1149
3 PAPAGEORGIOU C P. A general framework for object detection [C] // Computer Vision and Pattern Recognition. Santa Barbara: IEEE, 1998: 511–562.
4 PAPAGEORGIOU C, POGGIO T A trainable system for object detection[J]. International Journal of Computer Vision, 2000, 38 (1): 15- 33
doi: 10.1023/A:1008162616689
5 VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features [C] // 2001 Proceedings of Computer Vision and Pattern Recognition. Kauai: IEEE, 2001: I-I.
6 LOWE D G Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60 (2): 91- 110
doi: 10.1023/B:VISI.0000029664.99615.94
7 DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C] // IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005: 886–893.
8 KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C] // International Conference on Neural Information Processing Systems. Lake Tahoe: Springer, 2012: 1097–1105.
9 GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Puerto: IEEE, 2014: 580–587.
10 FELZENSZWALB P F, MCALLESTER D A, RAMANAN D. A discriminatively trained, multiscale, deformable part model [C] // Computer Vision and Pattern Recognition. Hausdorff: IEEE, 2008: 1–8.
11 FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D A, et al Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32 (9): 1627- 1645
doi: 10.1109/TPAMI.2009.167
12 GIRSHICK R B. Fast R-CNN [C] // International Conference on Computer Vision, Santiago: IEEE, 2015: 1440–1448.
13 HE K, ZHANG X, REN S, et al Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37 (9): 1904- 1916
14 LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C] // IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 936–944.
15 HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN [C] // IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980–2988.
16 REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779–788.
17 SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. preprint arXiv: 1409.1556v6.
18 LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multiBox detector [C] // European Conference on Computer Vision. Amsterdam: Springer, 2016: 21–37.
19 RUSSAKOVSKY O, DENG J, SU H, et al ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252
doi: 10.1007/s11263-015-0816-y
20 GHIASI G, FOWLKES C C. Laplacian pyramid reconstruction and refinement for semantic segmentation [C] // European Conference on Computer Vision. Amsterdam: Springer, 2016: 519–534.
[1] 郑守国,张勇德,谢文添,樊虎,王青. 基于数字孪生的飞机总装生产线建模[J]. 浙江大学学报(工学版), 2021, 55(5): 843-854.
[2] 张师林,马思明,顾子谦. 基于大边距度量学习的车辆再识别方法[J]. 浙江大学学报(工学版), 2021, 55(5): 948-956.
[3] 宋鹏,杨德东,李畅,郭畅. 整体特征通道识别的自适应孪生网络跟踪算法[J]. 浙江大学学报(工学版), 2021, 55(5): 966-975.
[4] 蔡君,赵罡,于勇,鲍强伟,戴晟. 基于点云和设计模型的仿真模型快速重构方法[J]. 浙江大学学报(工学版), 2021, 55(5): 905-916.
[5] 王虹力,郭斌,刘思聪,刘佳琪,仵允港,於志文. 边端融合的终端情境自适应深度感知模型[J]. 浙江大学学报(工学版), 2021, 55(4): 626-638.
[6] 张腾,蒋鑫龙,陈益强,陈前,米涛免,陈彪. 基于腕部姿态的帕金森病用药后开-关期检测[J]. 浙江大学学报(工学版), 2021, 55(4): 639-647.
[7] 郑英杰,吴松荣,韦若禹,涂振威,廖进,刘东. 基于目标图像FCM算法的地铁定位点匹配及误报排除方法[J]. 浙江大学学报(工学版), 2021, 55(3): 586-593.
[8] 雍子叶,郭继昌,李重仪. 融入注意力机制的弱监督水下图像增强算法[J]. 浙江大学学报(工学版), 2021, 55(3): 555-562.
[9] 于勇,薛静远,戴晟,鲍强伟,赵罡. 机加零件质量预测与工艺参数优化方法[J]. 浙江大学学报(工学版), 2021, 55(3): 441-447.
[10] 胡惠雅,盖绍彦,达飞鹏. 基于生成对抗网络的偏转人脸转正[J]. 浙江大学学报(工学版), 2021, 55(1): 116-123.
[11] 陈杨波,伊国栋,张树有. 基于点云特征对比的曲面翘曲变形检测方法[J]. 浙江大学学报(工学版), 2021, 55(1): 81-88.
[12] 段有康,陈小刚,桂剑,马斌,李顺芬,宋志棠. 基于相位划分的下肢连续运动预测[J]. 浙江大学学报(工学版), 2021, 55(1): 89-95.
[13] 张太恒,梅标,乔磊,杨浩杰,朱伟东. 纹理边界引导的复合材料圆孔检测方法[J]. 浙江大学学报(工学版), 2020, 54(12): 2294-2300.
[14] 梁栋,刘昕宇,潘家兴,孙涵,周文俊,金子俊一. 动态背景下基于自更新像素共现的前景分割[J]. 浙江大学学报(工学版), 2020, 54(12): 2405-2413.
[15] 晋耀,张为. 采用Anchor-Free网络结构的实时火灾检测算法[J]. 浙江大学学报(工学版), 2020, 54(12): 2430-2436.