Please wait a minute...
浙江大学学报(工学版)  2019, Vol. 53 Issue (8): 1506-1516    DOI: 10.3785/j.issn.1008-973X.2019.08.009
计算机与控制工程     
聚焦难样本的区分尺度的文字检测方法
林泓(),卢瑶瑶
武汉理工大学 计算机科学与技术学院,湖北 武汉 430063
Scale differentiated text detection method focusing on hard examples
Hong LIN(),Yao-yao LU
College of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China
 全文: PDF(1067 KB)   HTML
摘要:

针对卷积神经网络中间特征层信息利用不充分,以及不区分尺度和难易样本的学习所导致的文字检测精度难以提高的问题,提出基于多路精细化特征融合的聚焦难样本的区分尺度的自然场景文字检测方法. 构建多路精细化的卷积神经网络融合层提取高分辨率特征图;按照文字标注矩形框的较长边的尺寸,将文字实例划分为3种尺度范围,并分布到不同的候选框提取网络中提取相应的候选框;设计聚焦损失函数对难样本进行重点学习以提高模型的表达能力并得到目标文字框. 实验表明,所提出的多路精细化特征提取方法在COCO-Text数据集上的文字召回率较高,聚焦难样本的区分尺度的文字检测方法在ICDAR2013、ICDAR2015标准数据集上的检测精度分别为0.89、0.83,与CTPN、RRPN等方法相比,在多尺度多方向的自然场景图像中具有更强的鲁棒性.

关键词: 深度学习自然场景文字检测特征融合难样本聚焦损失    
Abstract:

The accuracy of text detection is difficult to improve due to the inadequate utilization of the information in middle feature layers of convolutional neural networks and the learning without distinction of different scales and hard-easy examples. Aiming at this problem, a text detection method for natural scene images based on multi-channel refined feature fusion was proposed, which focused on hard examples and could distinguish different scales. The fusion layers of multi-channel refined convolutional neural network were constructed to extract high resolution feature maps. According to the size of the longer side of text label rectangle boxes, the text instances were divided into three scale ranges, and distributed to different proposal networks to extract corresponding proposals. The focal loss function was designed to focus on learning hard examples to improve the expressive ability of the model and obtain the target text bounding boxes. Experiments showed that the text recall of the proposed multi-channel refined feature extraction method on COCO-Text datasets was high. The detection accuracies of the differentiated-scale text detection method focusing on hard examples on ICDAR2013 and ICDAR2015 standard datasets were 0.89 and 0.83, respectively. Compared with CTPN and RRPN, the proposed method has stronger robustness in multi-scale and multi-orientation natural scene images.

Key words: deep learning    natural scene    text detection    feature fusion    hard examples    focal loss
收稿日期: 2018-09-07 出版日期: 2019-08-13
CLC:  TP 391  
作者简介: 林泓(1965—),女,副教授,从事深度学习、语言编译研究. orcid.org/0000-0001-5599-2877. E-mail: linhong@whut.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
林泓
卢瑶瑶

引用本文:

林泓,卢瑶瑶. 聚焦难样本的区分尺度的文字检测方法[J]. 浙江大学学报(工学版), 2019, 53(8): 1506-1516.

Hong LIN,Yao-yao LU. Scale differentiated text detection method focusing on hard examples. Journal of ZheJiang University (Engineering Science), 2019, 53(8): 1506-1516.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2019.08.009        http://www.zjujournals.com/eng/CN/Y2019/V53/I8/1506

图 1  多尺度特征精细化融合模块
图 2  自然场景图像中的文字检测流程图
图 3  区分尺度的文字检测网络结构图
方法 $R_{100}^{0.5}$ $R_{100}^{0.{\rm{7}}}$ $\bar R_{100}^{}$ $R_{{\rm{2}}00}^{0.5}$ $R_{{\rm{2}}00}^{0.{\rm{7}}}$ $\bar R_{{\rm{2}}00}^{}$ $R_{300}^{0.5}$ $R_{300}^{0.7}$ $\bar R_{300}^{}$
Faster RCNN-RPN 70.8 28.3 38.7 76.1 30.8 39.0 83.6 33.8 41.7
SSD-RPN 71.4 37.6 39.7 77.1 39.5 45.1 86.7 48.3 47.8
PVANet-RPN 71.7 38.1 40.2 78.3 40.2 43.3 87.6 43.4 44.9
FPN-RPN 68.1 39.9 41.6 80.3 40.9 45.2 88.6 48.8 49.2
Baseline1 72.3 37.3 43.2 81.0 45.0 45.5 88.9 47.0 48.9
Baseline2 81.5 40.0 43.5 82.7 45.1 45.8 88.9 49.0 49.2
Baseline3 81.8 40.2 43.6 83.0 45.3 46.1 89.3 49.2 49.3
RefineScale-RPN 76.8 41.0 43.5 84.6 45.5 46.4 89.8 49.3 49.5
表 1  COCO-Text数据集上不同候选框提取网络的召回率
图 4  数据集COCO-Text上不同候选框提取方法的召回率
图 5  难易因子 ${\gamma _{\rm{1}}},{\gamma _{\rm{2}}}$对F的影响
图 6  难易因子 ${\gamma _1}$最优值的确定
方法 R P F
RTD[8] 0.66 0.88 0.76
RTLF[9] 0.72 0.82 0.77
FASText[10] 0.69 0.84 0.77
CTPN[7] 0.83 0.93 0.88
RRD[30] 0.86 0.92 0.89
RRPN[11] 0.88 0.95 0.91
RefineScale-RPN 0.88 0.90 0.89
表 2  不同文字检测方法在ICDAR2013上的常用评价指标对比
方法 R P F
CTPN[7] 0.52 0.74 0.61
RTLF[9] 0.82 0.72 0.77
RRPN[11] 0.77 0.84 0.80
EAST[12] 0.78 0.83 0.81
RRD[30] 0.80 0.88 0.84
RefineScale-RPN 0.85 0.81 0.83
表 3  不同文字检测方法在ICDAR2015上的常用评价指标对比
图 7  所提方法在典型测试图片上的文字检测结果
1 姚聪. 自然图像中文字检测与识别研究[D]. 武汉: 华中科技大学, 2014.
YAO Cong. Research on text detection and recognition in natural images [D]. Wuhan: Huazhong University of Science and Technology, 2014.
2 杨飞 自然场景图像中的文字检测综述[J]. 电子设计工程, 2016, 24 (24): 165- 168
YANG Fei Detecting text in natural scene images were reviewed[J]. Electronic Design Engineering, 2016, 24 (24): 165- 168
3 DONOSER M, BISCHOF H. Efficient maximally stable extremal region (MSER) tracking [C]// Computer Vision and Pattern Recognition. New York: IEEE, 2006: 553-560.
4 EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform [C]// Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 2963-2970.
5 KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// International Conference on Neural Information Processing Systems. Lake Tahoe: ACM, 2012: 1097-1105.
6 周飞燕, 金林鹏, 董军 卷积神经网络研究综述[J]. 计算机学报, 2017, 40 (6): 1229- 1251
ZHOU Fei-yan, JIN Lin-peng, DONG Jun Review of convolution neural network[J]. Chinese Journal of Computers, 2017, 40 (6): 1229- 1251
doi: 10.11897/SP.J.1016.2017.01229
7 TIAN Z, HUANG W, HE T, et al. Detecting text in natural image with connectionist text proposal network [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 56-72.
8 YIN X C, YIN X, HUANG K, et al Robust text detection in natural scene images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36 (5): 970- 983
doi: 10.1109/TPAMI.2013.182
9 NEUMANN L, MATAS J Real-time lexicon-free scene text localization and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (9): 1872- 1885
doi: 10.1109/TPAMI.2015.2496234
10 BUTA M, NEUMANN L, MATAS J. Fastext: efficient unconstrained scene text detector [C]// Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1206-1214.
11 MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals [J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
12 ZHOU X, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2642-2651.
13 LIAO M, SHI B, BAI X, et al. TextBoxes: a fast text detector with a single deep neural network [C]// Thirty-First AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 4161-4167.
14 HONG S, ROH B, KIM K H, et al. PVANet: lightweight deep neural networks for real-time object detection [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. arXiv: 1611.08588.
15 DENG D, LIU H, LI X, et al. PixelLink: detecting scene text via instance segmentation [C]// Thirty-Second AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2018: 6773-6780.
16 SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 761-769.
17 ANTHIMOPOULOS M, GATOS B, PRATIKAKIS I A two-stage scheme for text detection in video images[J]. Image and Vision Computing, 2010, 28 (9): 1413- 1426
doi: 10.1016/j.imavis.2010.03.004
18 LIN T Y, GOYAL P, GIRSHICK R, et al Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, PP (99): 2999- 3007
19 LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
20 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Computer Vision and Pattern Recognition. Amsterdam: IEEE, 2016: 770-778.
21 LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017, 1925-1934.
22 REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Advances in Neural Information Processing Systems. Montreal: ACM, 2015: 91-99.
23 UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104 (2): 154- 171
doi: 10.1007/s11263-013-0620-5
24 ZITNICK C L, DOLLAR P. Edge boxes: locating object proposals from edges [C]// European Conference on Computer Vision. Zurich: Springer, 2014: 391-405.
25 VEIT A, MATERA T, NEUMANN L, et al. Coco-text: dataset and benchmark for text detection and recognition in natural images [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. arXiv: 1601.07140.
26 KARATZAS D, SHAFAIT F, UCHIDA S, et al. Robust reading competition [C]// 12th International Conference on Document Analysis and Recognition. Washington: IEEE, 2013: 1484-1493.
27 KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. Competition on robust reading [C]// 13th International Conference on Document Analysis and Recognition. Nancy: IEEE, 2015: 1156-1160.
28 WOLF C, JOLION J M Object count/area graphs for the evaluation of object detection and segmentation algorithms[J]. International Journal of Document Analysis and Recognition, 2006, 8 (4): 280- 296
doi: 10.1007/s10032-006-0014-0
29 LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37.
[1] 许佳辉,王敬昌,陈岭,吴勇. 基于图神经网络的地表水水质预测模型[J]. 浙江大学学报(工学版), 2021, 55(4): 601-607.
[2] 王虹力,郭斌,刘思聪,刘佳琪,仵允港,於志文. 边端融合的终端情境自适应深度感知模型[J]. 浙江大学学报(工学版), 2021, 55(4): 626-638.
[3] 张腾,蒋鑫龙,陈益强,陈前,米涛免,陈彪. 基于腕部姿态的帕金森病用药后开-关期检测[J]. 浙江大学学报(工学版), 2021, 55(4): 639-647.
[4] 徐利锋,黄海帆,丁维龙,范玉雷. 基于改进DenseNet的水果小目标检测[J]. 浙江大学学报(工学版), 2021, 55(2): 377-385.
[5] 许豪灿,李基拓,陆国栋. 由LeNet-5从单张着装图像重建三维人体[J]. 浙江大学学报(工学版), 2021, 55(1): 153-161.
[6] 黄毅鹏,胡冀苏,钱旭升,周志勇,赵文露,马麒,沈钧康,戴亚康. SE-Mask-RCNN:多参数MRI前列腺癌分割方法[J]. 浙江大学学报(工学版), 2021, 55(1): 203-212.
[7] 郑浦,白宏阳,李伟,郭宏伟. 复杂背景下的小目标检测算法[J]. 浙江大学学报(工学版), 2020, 54(9): 1777-1784.
[8] 陈巧红,陈翊,李文书,贾宇波. 多尺度SE-Xception服装图像分类[J]. 浙江大学学报(工学版), 2020, 54(9): 1727-1735.
[9] 周登文,田金月,马路遥,孙秀秀. 基于多级特征并联的轻量级图像语义分割[J]. 浙江大学学报(工学版), 2020, 54(8): 1516-1524.
[10] 明涛,王丹,郭继昌,李锵. 基于多尺度通道重校准的乳腺癌病理图像分类[J]. 浙江大学学报(工学版), 2020, 54(7): 1289-1297.
[11] 闫旭,范晓亮,郑传潘,臧彧,王程,程明,陈龙彪. 基于图卷积神经网络的城市交通态势预测算法[J]. 浙江大学学报(工学版), 2020, 54(6): 1147-1155.
[12] 汪周飞,袁伟娜. 基于深度学习的多载波系统信道估计与检测[J]. 浙江大学学报(工学版), 2020, 54(4): 732-738.
[13] 杨冰,莫文博,姚金良. 融合局部特征与深度学习的三维掌纹识别[J]. 浙江大学学报(工学版), 2020, 54(3): 540-545.
[14] 洪炎佳,孟铁豹,黎浩江,刘立志,李立,徐硕瑀,郭圣文. 多模态多维信息融合的鼻咽癌MR图像肿瘤深度分割方法[J]. 浙江大学学报(工学版), 2020, 54(3): 566-573.
[15] 贾子钰,林友芳,张宏钧,王晶. 基于深度卷积神经网络的睡眠分期模型[J]. 浙江大学学报(工学版), 2020, 54(10): 1899-1905.