Computer and Control Engineering |
|
|
|
|
Scale differentiated text detection method focusing on hard examples |
Hong LIN(),Yao-yao LU |
College of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China |
|
|
Abstract The accuracy of text detection is difficult to improve due to the inadequate utilization of the information in middle feature layers of convolutional neural networks and the learning without distinction of different scales and hard-easy examples. Aiming at this problem, a text detection method for natural scene images based on multi-channel refined feature fusion was proposed, which focused on hard examples and could distinguish different scales. The fusion layers of multi-channel refined convolutional neural network were constructed to extract high resolution feature maps. According to the size of the longer side of text label rectangle boxes, the text instances were divided into three scale ranges, and distributed to different proposal networks to extract corresponding proposals. The focal loss function was designed to focus on learning hard examples to improve the expressive ability of the model and obtain the target text bounding boxes. Experiments showed that the text recall of the proposed multi-channel refined feature extraction method on COCO-Text datasets was high. The detection accuracies of the differentiated-scale text detection method focusing on hard examples on ICDAR2013 and ICDAR2015 standard datasets were 0.89 and 0.83, respectively. Compared with CTPN and RRPN, the proposed method has stronger robustness in multi-scale and multi-orientation natural scene images.
|
Received: 07 September 2018
Published: 13 August 2019
|
|
聚焦难样本的区分尺度的文字检测方法
针对卷积神经网络中间特征层信息利用不充分,以及不区分尺度和难易样本的学习所导致的文字检测精度难以提高的问题,提出基于多路精细化特征融合的聚焦难样本的区分尺度的自然场景文字检测方法. 构建多路精细化的卷积神经网络融合层提取高分辨率特征图;按照文字标注矩形框的较长边的尺寸,将文字实例划分为3种尺度范围,并分布到不同的候选框提取网络中提取相应的候选框;设计聚焦损失函数对难样本进行重点学习以提高模型的表达能力并得到目标文字框. 实验表明,所提出的多路精细化特征提取方法在COCO-Text数据集上的文字召回率较高,聚焦难样本的区分尺度的文字检测方法在ICDAR2013、ICDAR2015标准数据集上的检测精度分别为0.89、0.83,与CTPN、RRPN等方法相比,在多尺度多方向的自然场景图像中具有更强的鲁棒性.
关键词:
深度学习,
自然场景,
文字检测,
特征融合,
难样本,
聚焦损失
|
|
[1] |
姚聪. 自然图像中文字检测与识别研究[D]. 武汉: 华中科技大学, 2014. YAO Cong. Research on text detection and recognition in natural images [D]. Wuhan: Huazhong University of Science and Technology, 2014.
|
|
|
[2] |
杨飞 自然场景图像中的文字检测综述[J]. 电子设计工程, 2016, 24 (24): 165- 168 YANG Fei Detecting text in natural scene images were reviewed[J]. Electronic Design Engineering, 2016, 24 (24): 165- 168
|
|
|
[3] |
DONOSER M, BISCHOF H. Efficient maximally stable extremal region (MSER) tracking [C]// Computer Vision and Pattern Recognition. New York: IEEE, 2006: 553-560.
|
|
|
[4] |
EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform [C]// Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 2963-2970.
|
|
|
[5] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// International Conference on Neural Information Processing Systems. Lake Tahoe: ACM, 2012: 1097-1105.
|
|
|
[6] |
周飞燕, 金林鹏, 董军 卷积神经网络研究综述[J]. 计算机学报, 2017, 40 (6): 1229- 1251 ZHOU Fei-yan, JIN Lin-peng, DONG Jun Review of convolution neural network[J]. Chinese Journal of Computers, 2017, 40 (6): 1229- 1251
doi: 10.11897/SP.J.1016.2017.01229
|
|
|
[7] |
TIAN Z, HUANG W, HE T, et al. Detecting text in natural image with connectionist text proposal network [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 56-72.
|
|
|
[8] |
YIN X C, YIN X, HUANG K, et al Robust text detection in natural scene images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36 (5): 970- 983
doi: 10.1109/TPAMI.2013.182
|
|
|
[9] |
NEUMANN L, MATAS J Real-time lexicon-free scene text localization and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (9): 1872- 1885
doi: 10.1109/TPAMI.2015.2496234
|
|
|
[10] |
BUTA M, NEUMANN L, MATAS J. Fastext: efficient unconstrained scene text detector [C]// Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1206-1214.
|
|
|
[11] |
MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals [J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
|
|
|
[12] |
ZHOU X, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2642-2651.
|
|
|
[13] |
LIAO M, SHI B, BAI X, et al. TextBoxes: a fast text detector with a single deep neural network [C]// Thirty-First AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 4161-4167.
|
|
|
[14] |
HONG S, ROH B, KIM K H, et al. PVANet: lightweight deep neural networks for real-time object detection [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. arXiv: 1611.08588.
|
|
|
[15] |
DENG D, LIU H, LI X, et al. PixelLink: detecting scene text via instance segmentation [C]// Thirty-Second AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2018: 6773-6780.
|
|
|
[16] |
SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 761-769.
|
|
|
[17] |
ANTHIMOPOULOS M, GATOS B, PRATIKAKIS I A two-stage scheme for text detection in video images[J]. Image and Vision Computing, 2010, 28 (9): 1413- 1426
doi: 10.1016/j.imavis.2010.03.004
|
|
|
[18] |
LIN T Y, GOYAL P, GIRSHICK R, et al Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, PP (99): 2999- 3007
|
|
|
[19] |
LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
|
|
|
[20] |
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Computer Vision and Pattern Recognition. Amsterdam: IEEE, 2016: 770-778.
|
|
|
[21] |
LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017, 1925-1934.
|
|
|
[22] |
REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Advances in Neural Information Processing Systems. Montreal: ACM, 2015: 91-99.
|
|
|
[23] |
UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104 (2): 154- 171
doi: 10.1007/s11263-013-0620-5
|
|
|
[24] |
ZITNICK C L, DOLLAR P. Edge boxes: locating object proposals from edges [C]// European Conference on Computer Vision. Zurich: Springer, 2014: 391-405.
|
|
|
[25] |
VEIT A, MATERA T, NEUMANN L, et al. Coco-text: dataset and benchmark for text detection and recognition in natural images [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. arXiv: 1601.07140.
|
|
|
[26] |
KARATZAS D, SHAFAIT F, UCHIDA S, et al. Robust reading competition [C]// 12th International Conference on Document Analysis and Recognition. Washington: IEEE, 2013: 1484-1493.
|
|
|
[27] |
KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. Competition on robust reading [C]// 13th International Conference on Document Analysis and Recognition. Nancy: IEEE, 2015: 1156-1160.
|
|
|
[28] |
WOLF C, JOLION J M Object count/area graphs for the evaluation of object detection and segmentation algorithms[J]. International Journal of Document Analysis and Recognition, 2006, 8 (4): 280- 296
doi: 10.1007/s10032-006-0014-0
|
|
|
[29] |
LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|