Computer Technology |
|
|
|
|
Multi-scale convolution target detection algorithm with feature pyramid |
Zhi-jie LIN1,2( ),Zhuang LUO2,Lei ZHAO2,*( ),Dong-ming LU2 |
1. School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China 2. School of computer science, Zhejiang University, Hangzhou 310027, China |
|
|
Abstract A feature pyramid multi-scale network structure was constructed based on the region recommendation network, the small target and class-independent image target were detected by combining the full convolution operation. In order to improve the detection accuracy of small targets in images, a three-layer pyramid structure network based on side link fusion was constructed, which made full use of the convolution features of images with low semantic level. To improve the robustness of class-independent image target detection, a specific non-maximum suppression algorithm was proposed to eliminate redundant target windows in overlapping target filtering and to refine the location of the target windows. The experimental results on PASCAL VOC 2007, PASCAL VOC 2012 and ancient painting datasets show that the detection accuracy of the proposed algorithm for small targets, multi-scale targets and type-independent targets is higher than that of the existing algorithms.
|
Received: 27 April 2018
Published: 04 March 2019
|
|
Corresponding Authors:
Lei ZHAO
E-mail: bytelin@qq.com;cszhl@zju.edu.cn
|
特征金字塔多尺度全卷积目标检测算法
基于区域建议网络构建一种特征金字塔多尺度网络结构,并结合全卷积操作完成微小目标与类别无关目标的检测. 为了提升图像中微小目标的检测精度,构建基于侧链接融合的3层金字塔结构网络,充分利用语义级别比较低的图像卷积特征. 为了提高类别无关的图像目标检测鲁棒性,提出特定的非极大值抑制算法,在重叠目标过滤时消除冗余目标窗口,并对目标窗口进行位置精修. 在PASCAL VOC 2007、PASCAL VOC 2012以及古代绘画数据集上的实验结果表明:所提算法对于微小目标、多尺度目标检测及种类无关的目标检测的检测精度高于已有算法.
关键词:
图像目标检测,
图像特征金字塔,
多尺度全卷积,
微小目标检测,
类别无关目标检测
|
|
[1] |
SERMANET P, EIGEN D, ZHANG X, et al. OverFeat: integrated recognition, localization and detection using convolutional networks [EB/OL]. preprint arXiv: 1312.6229.
|
|
|
[2] |
REN S, HE K, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137- 1149
|
|
|
[3] |
PAPAGEORGIOU C P. A general framework for object detection [C] // Computer Vision and Pattern Recognition. Santa Barbara: IEEE, 1998: 511–562.
|
|
|
[4] |
PAPAGEORGIOU C, POGGIO T A trainable system for object detection[J]. International Journal of Computer Vision, 2000, 38 (1): 15- 33
doi: 10.1023/A:1008162616689
|
|
|
[5] |
VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features [C] // 2001 Proceedings of Computer Vision and Pattern Recognition. Kauai: IEEE, 2001: I-I.
|
|
|
[6] |
LOWE D G Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60 (2): 91- 110
doi: 10.1023/B:VISI.0000029664.99615.94
|
|
|
[7] |
DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C] // IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005: 886–893.
|
|
|
[8] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C] // International Conference on Neural Information Processing Systems. Lake Tahoe: Springer, 2012: 1097–1105.
|
|
|
[9] |
GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Puerto: IEEE, 2014: 580–587.
|
|
|
[10] |
FELZENSZWALB P F, MCALLESTER D A, RAMANAN D. A discriminatively trained, multiscale, deformable part model [C] // Computer Vision and Pattern Recognition. Hausdorff: IEEE, 2008: 1–8.
|
|
|
[11] |
FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D A, et al Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32 (9): 1627- 1645
doi: 10.1109/TPAMI.2009.167
|
|
|
[12] |
GIRSHICK R B. Fast R-CNN [C] // International Conference on Computer Vision, Santiago: IEEE, 2015: 1440–1448.
|
|
|
[13] |
HE K, ZHANG X, REN S, et al Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37 (9): 1904- 1916
|
|
|
[14] |
LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C] // IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 936–944.
|
|
|
[15] |
HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN [C] // IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980–2988.
|
|
|
[16] |
REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779–788.
|
|
|
[17] |
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. preprint arXiv: 1409.1556v6.
|
|
|
[18] |
LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multiBox detector [C] // European Conference on Computer Vision. Amsterdam: Springer, 2016: 21–37.
|
|
|
[19] |
RUSSAKOVSKY O, DENG J, SU H, et al ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252
doi: 10.1007/s11263-015-0816-y
|
|
|
[20] |
GHIASI G, FOWLKES C C. Laplacian pyramid reconstruction and refinement for semantic segmentation [C] // European Conference on Computer Vision. Amsterdam: Springer, 2016: 519–534.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|