The accuracy of text detection is difficult to improve due to the inadequate utilization of the information in middle feature layers of convolutional neural networks and the learning without distinction of different scales and hard-easy examples. Aiming at this problem, a text detection method for natural scene images based on multi-channel refined feature fusion was proposed, which focused on hard examples and could distinguish different scales. The fusion layers of multi-channel refined convolutional neural network were constructed to extract high resolution feature maps. According to the size of the longer side of text label rectangle boxes, the text instances were divided into three scale ranges, and distributed to different proposal networks to extract corresponding proposals. The focal loss function was designed to focus on learning hard examples to improve the expressive ability of the model and obtain the target text bounding boxes. Experiments showed that the text recall of the proposed multi-channel refined feature extraction method on COCO-Text datasets was high. The detection accuracies of the differentiated-scale text detection method focusing on hard examples on ICDAR2013 and ICDAR2015 standard datasets were 0.89 and 0.83, respectively. Compared with CTPN and RRPN, the proposed method has stronger robustness in multi-scale and multi-orientation natural scene images.
Hong LIN,Yao-yao LU. Scale differentiated text detection method focusing on hard examples. Journal of ZheJiang University (Engineering Science), 2019, 53(8): 1506-1516.
Fig.1Module of refined fusion with multi-scale features
Fig.2Flow chart of text detection in natural scene image
Fig.3Structure map of scale differentiated text detection network
方法
$R_{100}^{0.5}$
$R_{100}^{0.{\rm{7}}}$
$\bar R_{100}^{}$
$R_{{\rm{2}}00}^{0.5}$
$R_{{\rm{2}}00}^{0.{\rm{7}}}$
$\bar R_{{\rm{2}}00}^{}$
$R_{300}^{0.5}$
$R_{300}^{0.7}$
$\bar R_{300}^{}$
Faster RCNN-RPN
70.8
28.3
38.7
76.1
30.8
39.0
83.6
33.8
41.7
SSD-RPN
71.4
37.6
39.7
77.1
39.5
45.1
86.7
48.3
47.8
PVANet-RPN
71.7
38.1
40.2
78.3
40.2
43.3
87.6
43.4
44.9
FPN-RPN
68.1
39.9
41.6
80.3
40.9
45.2
88.6
48.8
49.2
Baseline1
72.3
37.3
43.2
81.0
45.0
45.5
88.9
47.0
48.9
Baseline2
81.5
40.0
43.5
82.7
45.1
45.8
88.9
49.0
49.2
Baseline3
81.8
40.2
43.6
83.0
45.3
46.1
89.3
49.2
49.3
RefineScale-RPN
76.8
41.0
43.5
84.6
45.5
46.4
89.8
49.3
49.5
Tab.1Recall of different region proposal networks on dataset COCO-Text
Fig.4Recall of different proposal methods on dataset COCO-Text
Fig.5Effect of hard-easy factors ${\gamma _{\rm{1}}},{\gamma _{\rm{2}}}$ on F
Fig.6Determination of optimal value of hard-easy factor ${\gamma _1}$
方法
R
P
F
RTD[8]
0.66
0.88
0.76
RTLF[9]
0.72
0.82
0.77
FASText[10]
0.69
0.84
0.77
CTPN[7]
0.83
0.93
0.88
RRD[30]
0.86
0.92
0.89
RRPN[11]
0.88
0.95
0.91
RefineScale-RPN
0.88
0.90
0.89
Tab.2Comparison of common evaluation indexes for different text detection methods on dataset ICDAR2013
方法
R
P
F
CTPN[7]
0.52
0.74
0.61
RTLF[9]
0.82
0.72
0.77
RRPN[11]
0.77
0.84
0.80
EAST[12]
0.78
0.83
0.81
RRD[30]
0.80
0.88
0.84
RefineScale-RPN
0.85
0.81
0.83
Tab.3Comparison of common evaluation indexes for different text detection methods on dataset ICDAR2015
Fig.7Text detection results of proposed method on typical test images
[1]
姚聪. 自然图像中文字检测与识别研究[D]. 武汉: 华中科技大学, 2014. YAO Cong. Research on text detection and recognition in natural images [D]. Wuhan: Huazhong University of Science and Technology, 2014.
[2]
杨飞 自然场景图像中的文字检测综述[J]. 电子设计工程, 2016, 24 (24): 165- 168 YANG Fei Detecting text in natural scene images were reviewed[J]. Electronic Design Engineering, 2016, 24 (24): 165- 168
[3]
DONOSER M, BISCHOF H. Efficient maximally stable extremal region (MSER) tracking [C]// Computer Vision and Pattern Recognition. New York: IEEE, 2006: 553-560.
[4]
EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform [C]// Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 2963-2970.
[5]
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// International Conference on Neural Information Processing Systems. Lake Tahoe: ACM, 2012: 1097-1105.
[6]
周飞燕, 金林鹏, 董军 卷积神经网络研究综述[J]. 计算机学报, 2017, 40 (6): 1229- 1251 ZHOU Fei-yan, JIN Lin-peng, DONG Jun Review of convolution neural network[J]. Chinese Journal of Computers, 2017, 40 (6): 1229- 1251
doi: 10.11897/SP.J.1016.2017.01229
[7]
TIAN Z, HUANG W, HE T, et al. Detecting text in natural image with connectionist text proposal network [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 56-72.
[8]
YIN X C, YIN X, HUANG K, et al Robust text detection in natural scene images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36 (5): 970- 983
doi: 10.1109/TPAMI.2013.182
[9]
NEUMANN L, MATAS J Real-time lexicon-free scene text localization and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (9): 1872- 1885
doi: 10.1109/TPAMI.2015.2496234
[10]
BUTA M, NEUMANN L, MATAS J. Fastext: efficient unconstrained scene text detector [C]// Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1206-1214.
[11]
MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals [J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
[12]
ZHOU X, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2642-2651.
[13]
LIAO M, SHI B, BAI X, et al. TextBoxes: a fast text detector with a single deep neural network [C]// Thirty-First AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 4161-4167.
[14]
HONG S, ROH B, KIM K H, et al. PVANet: lightweight deep neural networks for real-time object detection [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. arXiv: 1611.08588.
[15]
DENG D, LIU H, LI X, et al. PixelLink: detecting scene text via instance segmentation [C]// Thirty-Second AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2018: 6773-6780.
[16]
SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 761-769.
[17]
ANTHIMOPOULOS M, GATOS B, PRATIKAKIS I A two-stage scheme for text detection in video images[J]. Image and Vision Computing, 2010, 28 (9): 1413- 1426
doi: 10.1016/j.imavis.2010.03.004
[18]
LIN T Y, GOYAL P, GIRSHICK R, et al Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, PP (99): 2999- 3007
[19]
LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
[20]
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Computer Vision and Pattern Recognition. Amsterdam: IEEE, 2016: 770-778.
[21]
LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation [C]// Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017, 1925-1934.
[22]
REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Advances in Neural Information Processing Systems. Montreal: ACM, 2015: 91-99.
[23]
UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104 (2): 154- 171
doi: 10.1007/s11263-013-0620-5
[24]
ZITNICK C L, DOLLAR P. Edge boxes: locating object proposals from edges [C]// European Conference on Computer Vision. Zurich: Springer, 2014: 391-405.
[25]
VEIT A, MATERA T, NEUMANN L, et al. Coco-text: dataset and benchmark for text detection and recognition in natural images [C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. arXiv: 1601.07140.
[26]
KARATZAS D, SHAFAIT F, UCHIDA S, et al. Robust reading competition [C]// 12th International Conference on Document Analysis and Recognition. Washington: IEEE, 2013: 1484-1493.
[27]
KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. Competition on robust reading [C]// 13th International Conference on Document Analysis and Recognition. Nancy: IEEE, 2015: 1156-1160.
[28]
WOLF C, JOLION J M Object count/area graphs for the evaluation of object detection and segmentation algorithms[J]. International Journal of Document Analysis and Recognition, 2006, 8 (4): 280- 296
doi: 10.1007/s10032-006-0014-0
[29]
LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37.