|
|
Traffic sign detection and recognition based on residual single shot multibox detector model |
Shu-fang ZHANG( ),Tong ZHU |
School of Electronical and Information Engineering, Tianjin University, Tianjin 300072, China |
|
|
Abstract The existing target detection methods were only suitable for large size and few specific types of traffic signs, and showed poor performance on complex traffic scene images. The ResNet101 with strong anti-degradation performance was used as basic network, and then a residual single shot multibox detector (SSD) model added with a number of convolution layers was proposed, in order to conduct multi-scale block detection on high resolution traffic images. A strategy Coarse-to-Fine was adopted to omit the prediction of pure background image blocks, in order to speed up. The target range was narrowed by the initial detection results of the medium scale image block. The other blocks within the target range were detected. All the block results were mapped back to the original image and non-maximum suppression was used to realize accurate recognition. Experiment results showed that the proposed method achieved 94% overall accuracy and 95% overall recall on the public traffic sign dataset Tsinghua-Tencent 100K. The detection ability on traffic sign with different sizes and shapes in multi-resolution images was strong and the proposed model was robust.
|
Received: 11 April 2018
Published: 17 May 2019
|
|
基于残差单发多框检测器模型的交通标志检测与识别
针对现有目标检测方法仅适用于大尺寸、少量特定种类交通标志的检测,且对复杂交通场景图像检测效果不佳的问题,以抗退化性能较强的ResNet101为基础网络,增加若干卷积层构建残差单发多框检测器(SSD)模型,对高分辨率的交通图像进行多尺度分块检测。为了加快检测速度,采取由粗到精的策略,省略对纯背景图像块的预测. 利用中等尺度图像块的初检结果缩小目标范围;对目标范围内的其他图像块进行检测;将所有图像块结果映射回原图像,并结合非极大值抑制实现精准识别。实验结果表明,该模型在公开的交通标志数据集Tsinghua-Tencent 100K上取得了94%的总体准确率和95%的总体召回率,对多分辨率图像中不同大小和形态的交通标志都具有良好的检测能力,鲁棒性较强。
关键词:
交通标志,
残差单发多框检测器(SSD)模型,
多尺度分块,
检测,
由粗到精
|
|
[1] |
RUTA A, LI Y M, LIU X H. Detection, tracking and recognition of traffic signs from video input [C]// Proceedings of the 11th International IEEE Conference on Intelligent Transportation Systems. Beijing: IEEE, 2008: 55–60.
|
|
|
[2] |
ABUKHAIT J, ABDEL-QADER I, OH J S, et al Road sign detection and shape recognition invariant to sign defects[J]. Social Science Electronic Publishing, 2012, 61 (1): 1- 6
|
|
|
[3] |
刘芳, 邹琪 基于视觉注意机制的交通标志检测[J]. 计算机工程, 2013, 39 (2): 192- 196 LIU Fang, ZOU Qi Traffic sign detection based on visual attention mechanism[J]. Computer Engineering, 2013, 39 (2): 192- 196
|
|
|
[4] |
HUANG Z Y, YU Y L, GU J, et al An efficient method for traffic sign recognition based on extreme learning machine[J]. IEEE Transactions on Cybernetics, 2017, 47 (4): 920- 933
|
|
|
[5] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// Advances in Neural Information Processing Systems 25. Nevada: NIPS, 2012: 1097–1105.
|
|
|
[6] |
GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Ohio: IEEE, 2014: 580–587.
|
|
|
[7] |
GIRSHICK R. Fast R-CNN [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1440–1448.
|
|
|
[8] |
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Advances in Neural Information Processing Systems 28. Montreal: NIPS, 2015: 91–99.
|
|
|
[9] |
DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks [C]// Advances in Neural Information Processing Systems 29. Barcelona: NIPS, 2016: 379–387.
|
|
|
[10] |
REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision And Pattern. Nevada: IEEE, 2016: 779–788.
|
|
|
[11] |
LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// Proceedings of the 2016 European Conference on Computer Vision. Amsterdam: ECCV, 2016: 21–37.
|
|
|
[12] |
RUSSAKOVSKY O, DENG J, SU H, et al ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252
|
|
|
[13] |
EVERINGHAM M, GOOL L, WILLIAMS C K, et al The pascal visual object classes (voc) challenge[J]. International Journal of Computer Vision, 2010, 88 (2): 202- 228
|
|
|
[14] |
STALLKAMP J, SCHLIPSING M, SALMEN J, et al. The German traffic sign detection benchmark[EB/OL]. [2013-11-05]. http://benchmark.ini.rub.de/?section=gtsdb&subsection=news.
|
|
|
[15] |
STALLKAMP J, SCHLIPSING M, SALMEN J, et al. The German traffic sign recognition benchmark[EB/OL]. [2012-03-16]. http://benchmark.ini.rub.de/?section=gtsrb&subsection=news.
|
|
|
[16] |
TIMOFTE R. BelgiumTS dataset [EB/OL]. [2014-02-18]. http://btsd.ethz.ch/shareddata/.
|
|
|
[17] |
ZHU Z, LIANG D, ZHANG S H, et al. Traffic-sign detection and classification in the wild [C]// Proceedings of the 2016 IEEE Conference on Computer Vision And Pattern Recognition. Nevada: IEEE, 2016: 2110–2118.
|
|
|
[18] |
MENG Z B, FAN X C, CHEN X, et al. Detecting small signs from large images [C]// Proceedings of the 2017 IEEE International Conference on Information Reuse and Integration. California: IEEE, 2017: 217–224.
|
|
|
[19] |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C] // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Nevada: IEEE, 2016: 770–778.
|
|
|
[20] |
SIMON-YAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C]// International Conference on Learning Representations. San Diego: ICLR, 2015: 1-14.
|
|
|
[21] |
SHELHAMER E, LONG J, DARRELL T, et al Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (4): 640- 651
|
|
|
[22] |
HARIHARAN B, ARBELAEZ P, GIRSHICK R, et al. Hypercolumns for object segmentation and fine-grained localization [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Massachusetts: IEEE, 2015: 447–456.
|
|
|
[23] |
LIU W, RABINOVICH A, BERG A C. ParseNet: looking wider to see better [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Massachusetts: IEEE, 2015: 1–11.
|
|
|
[24] |
JIA Y, SHELHAMER E, DONAHUE J, et al. Caffe: convolutional architecture for fast feature embedding [C]// ACM International Conference on Multimedia. Florida: ACM, 2014: 675–678.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|