Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2019, Vol. 53 Issue (8): 1506-1516    DOI: 10.3785/j.issn.1008-973X.2019.08.009
Computer and Control Engineering     
Scale differentiated text detection method focusing on hard examples
Hong LIN(),Yao-yao LU
College of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China
Download: HTML     PDF(1067KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

The accuracy of text detection is difficult to improve due to the inadequate utilization of the information in middle feature layers of convolutional neural networks and the learning without distinction of different scales and hard-easy examples. Aiming at this problem, a text detection method for natural scene images based on multi-channel refined feature fusion was proposed, which focused on hard examples and could distinguish different scales. The fusion layers of multi-channel refined convolutional neural network were constructed to extract high resolution feature maps. According to the size of the longer side of text label rectangle boxes, the text instances were divided into three scale ranges, and distributed to different proposal networks to extract corresponding proposals. The focal loss function was designed to focus on learning hard examples to improve the expressive ability of the model and obtain the target text bounding boxes. Experiments showed that the text recall of the proposed multi-channel refined feature extraction method on COCO-Text datasets was high. The detection accuracies of the differentiated-scale text detection method focusing on hard examples on ICDAR2013 and ICDAR2015 standard datasets were 0.89 and 0.83, respectively. Compared with CTPN and RRPN, the proposed method has stronger robustness in multi-scale and multi-orientation natural scene images.



Key wordsdeep learning      natural scene      text detection      feature fusion      hard examples      focal loss     
Received: 07 September 2018      Published: 13 August 2019
CLC:  TP 391  
Cite this article:

Hong LIN,Yao-yao LU. Scale differentiated text detection method focusing on hard examples. Journal of ZheJiang University (Engineering Science), 2019, 53(8): 1506-1516.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2019.08.009     OR     http://www.zjujournals.com/eng/Y2019/V53/I8/1506


聚焦难样本的区分尺度的文字检测方法

针对卷积神经网络中间特征层信息利用不充分,以及不区分尺度和难易样本的学习所导致的文字检测精度难以提高的问题,提出基于多路精细化特征融合的聚焦难样本的区分尺度的自然场景文字检测方法. 构建多路精细化的卷积神经网络融合层提取高分辨率特征图;按照文字标注矩形框的较长边的尺寸,将文字实例划分为3种尺度范围,并分布到不同的候选框提取网络中提取相应的候选框;设计聚焦损失函数对难样本进行重点学习以提高模型的表达能力并得到目标文字框. 实验表明,所提出的多路精细化特征提取方法在COCO-Text数据集上的文字召回率较高,聚焦难样本的区分尺度的文字检测方法在ICDAR2013、ICDAR2015标准数据集上的检测精度分别为0.89、0.83,与CTPN、RRPN等方法相比,在多尺度多方向的自然场景图像中具有更强的鲁棒性.


关键词: 深度学习,  自然场景,  文字检测,  特征融合,  难样本,  聚焦损失 
Fig.1 Module of refined fusion with multi-scale features
Fig.2 Flow chart of text detection in natural scene image
Fig.3 Structure map of scale differentiated text detection network
方法 $R_{100}^{0.5}$ $R_{100}^{0.{\rm{7}}}$ $\bar R_{100}^{}$ $R_{{\rm{2}}00}^{0.5}$ $R_{{\rm{2}}00}^{0.{\rm{7}}}$ $\bar R_{{\rm{2}}00}^{}$ $R_{300}^{0.5}$ $R_{300}^{0.7}$ $\bar R_{300}^{}$
Faster RCNN-RPN 70.8 28.3 38.7 76.1 30.8 39.0 83.6 33.8 41.7
SSD-RPN 71.4 37.6 39.7 77.1 39.5 45.1 86.7 48.3 47.8
PVANet-RPN 71.7 38.1 40.2 78.3 40.2 43.3 87.6 43.4 44.9
FPN-RPN 68.1 39.9 41.6 80.3 40.9 45.2 88.6 48.8 49.2
Baseline1 72.3 37.3 43.2 81.0 45.0 45.5 88.9 47.0 48.9
Baseline2 81.5 40.0 43.5 82.7 45.1 45.8 88.9 49.0 49.2
Baseline3 81.8 40.2 43.6 83.0 45.3 46.1 89.3 49.2 49.3
RefineScale-RPN 76.8 41.0 43.5 84.6 45.5 46.4 89.8 49.3 49.5
Tab.1 Recall of different region proposal networks on dataset COCO-Text
Fig.4 Recall of different proposal methods on dataset COCO-Text
Fig.5 Effect of hard-easy factors ${\gamma _{\rm{1}}},{\gamma _{\rm{2}}}$ on F
Fig.6 Determination of optimal value of hard-easy factor ${\gamma _1}$
方法 R P F
RTD[8] 0.66 0.88 0.76
RTLF[9] 0.72 0.82 0.77
FASText[10] 0.69 0.84 0.77
CTPN[7] 0.83 0.93 0.88
RRD[30] 0.86 0.92 0.89
RRPN[11] 0.88 0.95 0.91
RefineScale-RPN 0.88 0.90 0.89
Tab.2 Comparison of common evaluation indexes for different text detection methods on dataset ICDAR2013
方法 R P F
CTPN[7] 0.52 0.74 0.61
RTLF[9] 0.82 0.72 0.77
RRPN[11] 0.77 0.84 0.80
EAST[12] 0.78 0.83 0.81
RRD[30] 0.80 0.88 0.84
RefineScale-RPN 0.85 0.81 0.83
Tab.3 Comparison of common evaluation indexes for different text detection methods on dataset ICDAR2015
Fig.7 Text detection results of proposed method on typical test images
[1]   姚聪. 自然图像中文字检测与识别研究[D]. 武汉: 华中科技大学, 2014.
YAO Cong. Research on text detection and recognition in natural images [D]. Wuhan: Huazhong University of Science and Technology, 2014.
[2]   杨飞 自然场景图像中的文字检测综述[J]. 电子设计工程, 2016, 24 (24): 165- 168
YANG Fei Detecting text in natural scene images were reviewed[J]. Electronic Design Engineering, 2016, 24 (24): 165- 168
[3]   DONOSER M, BISCHOF H. Efficient maximally stable extremal region (MSER) tracking [C]// Computer Vision and Pattern Recognition. New York: IEEE, 2006: 553-560.
[4]   EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform [C]// Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 2963-2970.
[5]   KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// International Conference on Neural Information Processing Systems. Lake Tahoe: ACM, 2012: 1097-1105.
[6]   周飞燕, 金林鹏, 董军 卷积神经网络研究综述[J]. 计算机学报, 2017, 40 (6): 1229- 1251
ZHOU Fei-yan, JIN Lin-peng, DONG Jun Review of convolution neural network[J]. Chinese Journal of Computers, 2017, 40 (6): 1229- 1251
doi: 10.11897/SP.J.1016.2017.01229
[7]   TIAN Z, HUANG W, HE T, et al. Detecting text in natural image with connectionist text proposal network [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 56-72.
[8]   YIN X C, YIN X, HUANG K, et al Robust text detection in natural scene images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36 (5): 970- 983
doi: 10.1109/TPAMI.2013.182
[9]   NEUMANN L, MATAS J Real-time lexicon-free scene text localization and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (9): 1872- 1885
doi: 10.1109/TPAMI.2015.2496234
[10]   BUTA M, NEUMANN L, MATAS J. Fastext: efficient unconstrained scene text detector [C]// Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1206-1214.
[11]   MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals [J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
[12]   ZHOU X, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2642-2651.
[13]   LIAO M, SHI B, BAI X, et al. TextBoxes: a fast text detector with a single deep neural network [C]// Thirty-First AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 4161-4167.
[14]   HONG S, ROH B, KIM K H, et al. PVANet: lightweight deep neural networks for real-time object detection [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. arXiv: 1611.08588.
[15]   DENG D, LIU H, LI X, et al. PixelLink: detecting scene text via instance segmentation [C]// Thirty-Second AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2018: 6773-6780.
[16]   SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 761-769.
[17]   ANTHIMOPOULOS M, GATOS B, PRATIKAKIS I A two-stage scheme for text detection in video images[J]. Image and Vision Computing, 2010, 28 (9): 1413- 1426
doi: 10.1016/j.imavis.2010.03.004
[18]   LIN T Y, GOYAL P, GIRSHICK R, et al Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, PP (99): 2999- 3007
[19]   LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
[20]   HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Computer Vision and Pattern Recognition. Amsterdam: IEEE, 2016: 770-778.
[21]   LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017, 1925-1934.
[22]   REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Advances in Neural Information Processing Systems. Montreal: ACM, 2015: 91-99.
[23]   UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104 (2): 154- 171
doi: 10.1007/s11263-013-0620-5
[24]   ZITNICK C L, DOLLAR P. Edge boxes: locating object proposals from edges [C]// European Conference on Computer Vision. Zurich: Springer, 2014: 391-405.
[25]   VEIT A, MATERA T, NEUMANN L, et al. Coco-text: dataset and benchmark for text detection and recognition in natural images [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. arXiv: 1601.07140.
[26]   KARATZAS D, SHAFAIT F, UCHIDA S, et al. Robust reading competition [C]// 12th International Conference on Document Analysis and Recognition. Washington: IEEE, 2013: 1484-1493.
[27]   KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. Competition on robust reading [C]// 13th International Conference on Document Analysis and Recognition. Nancy: IEEE, 2015: 1156-1160.
[28]   WOLF C, JOLION J M Object count/area graphs for the evaluation of object detection and segmentation algorithms[J]. International Journal of Document Analysis and Recognition, 2006, 8 (4): 280- 296
doi: 10.1007/s10032-006-0014-0
[29]   LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37.
[1] Jia-hui XU,Jing-chang WANG,Ling CHEN,Yong WU. Surface water quality prediction model based on graph neural network[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 601-607.
[2] Hong-li WANG,Bin GUO,Si-cong LIU,Jia-qi LIU,Yun-gang WU,Zhi-wen YU. End context-adaptative deep sensing model with edge-end collaboration[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 626-638.
[3] Teng ZHANG,Xin-long JIANG,Yi-qiang CHEN,Qian CHEN,Tao-mian MI,Piu CHAN. Wrist attitude-based Parkinson's disease ON/OFF state assessment after medication[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 639-647.
[4] Li-feng XU,Hai-fan HUANG,Wei-long DING,Yu-lei FAN. Detection of small fruit target based on improved DenseNet[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 377-385.
[5] Hao-can XU,Ji-tuo LI,Guo-dong LU. Reconstruction of three-dimensional human bodies from single image by LeNet-5[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 153-161.
[6] Yi-peng HUANG,Ji-su HU,Xu-sheng QIAN,Zhi-yong ZHOU,Wen-lu ZHAO,Qi MA,Jun-kang SHEN,Ya-kang DAI. SE-Mask-RCNN: segmentation method for prostate cancer on multi-parametric MRI[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 203-212.
[7] Qiao-hong CHEN,YI CHEN,Wen-shu Li,Yu-bo JIA. Clothing image classification based on multi-scale SE-Xception[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1727-1735.
[8] Pu ZHENG,Hong-yang BAI,Wei LI,Hong-wei GUO. Small target detection algorithm in complex background[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1777-1784.
[9] Deng-wen ZHOU,Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN. Lightweight image semantic segmentation based on multi-level feature cascaded network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1516-1524.
[10] Tao MING,Dan WANG,Ji-chang GUO,Qiang LI. Breast cancer histopathological image classification using multi-scale channel squeeze-and-excitation model[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1289-1297.
[11] Xu YAN,Xiao-liang FAN,Chuan-pan ZHENG,Yu ZANG,Cheng WANG,Ming CHENG,Long-biao CHEN. Urban traffic flow prediction algorithm based on graph convolutional neural networks[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1147-1155.
[12] Zhou-fei WANG,Wei-na YUAN. Channel estimation and detection method for multicarrier system based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(4): 732-738.
[13] Bing YANG,Wen-bo MO,Jin-liang YAO. 3D palmprint recognition by using local features and deep learning[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 540-545.
[14] Yan-jia HONG,Tie-bao MENG,Hao-jiang LI,Li-zhi LIU,Li LI,Shuo-yu XU,Sheng-wen GUO. Deep segmentation method of tumor boundaries from MR images of patients with nasopharyngeal carcinoma using multi-modality and multi-dimension fusion[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 566-573.
[15] Zi-yu JIA,You-fang LIN,Hong-jun ZHANG,Jing WANG. Sleep stage classification model based ondeep convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(10): 1899-1905.