Please wait a minute...
Journal of Zhejiang University (Science Edition)  2021, Vol. 48 Issue (1): 1-8    DOI: 10.3785/j.issn.1008-9497.2021.01.001
Image Understanding and Data Sorting     
A three-stage text recognition framework for natural scene images
ZOU Beiji1,2, YANG Wenjun1,2, LIU Shu1,2, JIANG Lingzi1,2
1.School of Computer Science and Engineering, Central South University, Changsha 410083, China
2.Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha 410083, China
Download: HTML (   PDF(5690KB)
Export: BibTeX | EndNote (RIS)      

Abstract  Text recognition technology plays an important role in applications such as document management,image understanding,and visual navigation.However,the appearances text in natural scenes are often of arbitrary orientation,different shape and various fonts which makes it difficult to be detected and recognized.For natural scene images with irregular texts,a three-stage text recognition framework for natural scene images is proposed,including text detection,rectification and recognition.Firstly,a feature pyramid network is used to segment the character instances,and the affinity among them is predicted by a bidirectional long short-term memory,so as to group the isolated characters into words. It is reported that the F-score of text detection is as high as 91.97%.The detected words are then rectified by a multi-object rectification network,which can deal with complicated distortion of scene text to improve its readability.Finally,an attention-based sequence recognition network outputs the predictions in sequence to achieve the word-level recognition,where the recognition accuracy is as high as 84.98%.

Key wordstext detection      text rectification      natural scene      text recognition     
Received: 23 September 2020      Published: 20 January 2021
CLC:  TP 391.41  
Cite this article:

ZOU Beiji, YANG Wenjun, LIU Shu, JIANG Lingzi. A three-stage text recognition framework for natural scene images. Journal of Zhejiang University (Science Edition), 2021, 48(1): 1-8.

URL:

https://www.zjujournals.com/sci/EN/Y2021/V48/I1/1


面向自然场景图像的三阶段文字识别框架

文字识别技术在文档管理、图像理解、视觉导航等中具有重要应用。然而,自然场景中的文字通常排列任意、形状不一、字体多样,难以被检测和识别。提出了面向自然场景图像的三阶段文字识别框架,该框架包括文字检测、文字矫正和文字识别。首先,利用特征金字塔网络分割图像中的字符,基于双向长短期记忆网络获取字符间的亲和度,连接孤立字符构建单词行,文字检测率(F分数)高达91.97%。然后,通过多目标矫正网络矫正被检测文字,以应对场景图像文字的复杂形变,增强阅读性。最后,通过注意力序列识别网络按序输出预测结果,实现单词级识别,文字识别正确率达84.98%。

关键词: 文字识别,  自然场景,  文字检测,  文字矫正 
1 邹北骥,郭建京,朱承璋,等.基于自适应色彩聚类和上下文信息的自然场景文本检测[J].电子学报,2018,46(6):1436-1444. DOI:10.3969/j.issn.0372-2112.2018. 06.024 ZOU B G,GOU J J,ZHU C Z,et al.Natural scene text detection based on adaptive color clustering and context information[J].Acta Electronica Sinica,2018,46(6):1436-1444.DOI:10.3969/j.issn.0372-2112.2018.06.024
2 HE W H,ZHANG X Y,YIN F,et al. Multi-oriented and multi-lingual scene text detection with direct regression[J].IEEE Transactions on Image Processing,2018,27(11):5406-5419. DOI:10.1109/TIP.2018.2855399
3 LIAO M H,SHI B G,BAI X,et al.TextBoxes:A fast text detector with a single deep neural network[C]// Proceedings of Thirty-First the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.San Francisco:The AAAI Press,2017:4161-4167.
4 MA J Q,SHAO W Y,YE H,et al. Arbitrary-oriented scene text detection via rotation proposals[J].IEEE Transactions on Multimedia,2018,20(11):3111-3122. DOI:10.1109/TMM.2018.2818020
5 LIAO M H,SHI B G,BAI X.TextBoxes++:A single-shot oriented scene text detector[J].IEEE Transactions on Image Processing,2018,27(8):3676-3690. DOI:10.1109/TIP.2018.2825107
6 DENG D,LIU H F,LI X L,et al.PixelLink:Detecting scene text via instance segmentation[C]//Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence. New Orleans:The AAAI Press,2018:6773-6780.
7 SHI B G,BAI X,SERGE B.Detecting oriented text in natural images by linking segments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:2550-2558. DOI:10.1109/CVPR.2017.371
8 BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:9365-9374. DOI:10.1109/cvpr.2019.00959
9 SHI B G,YANG M K,WANG X G,et al.ASTER:An attentional scene text recognizer with flexible rectification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(9):2035-2048. DOI:10.1109/TPAMI.2018.2848939
10 LUO C,JIN L,SUN Z.Moran:A multi-object rectified attention network for scene text recognition[J].Pattern Recognition,2019,90:109-118.DOI:10.1016/j.patcog.2019.01.020
11 ZHAN F,LU S.ESIR:End-to-end scene text recognition via iterative image rectification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:2059-2068.
12 LIAO M,ZHANG J,WAN Z,et al.Scene text recognition from two-dimensional perspective[C]//Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.Honolulu:The AAAI Press,2019,33:8714-8721. DOI:10.1609/aaai.v33i01.33018714
13 LI H,WANG P,SHEN C,et al.Show,attend and read:A simple and strong baseline for irregular text recognition[C]//Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.Honolulu:The AAAI Press,2019,33:8610-8617.
14 LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:2117-2125.
15 GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5/6):602-610. DOI:10.1016/j.neunet.2005.06.042
16 HE K M,ZHANG X Y,REN S Q,et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778. DOI:10.1109/CVPR.2016.90
17 GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:2315-2324. DOI:10.1109/CVPR.2016.254
18 KARATZAS D,SHAFAIT F,UCHIDA S,et al.ICDAR 2013 robust reading competition[C]// Proceedings of 2013 12th International Conference on Document Analysis and Recognition.Washington DC:IEEE,2013:1484-1493.
19 KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.ICDAR 2015 competition on robust reading[C]//Proceedings of 2015 13th International Conference on Document Analysis and Recognition. Nancy:IEEE,2015:1156-1160.
20 KINGMA D P,BA J.ADAM:A method for stochastic optimization[C]//Proceedings of International Conference on Learning Representations. San Diego:IEEE,2015.
21 ZEILER M D.ADADELTA:An adaptive learning rate method[EB/OL].[2012-12-22].http://arXiv.org/abs/1212.5701.
22 YIN F,WU Y C,ZHANG X Y,et al.Scene text recognition with sliding convolutional character models[EB/OL].[2017-09-06].http://arXiv.org/abs/1709. 01727.
23 JADERBERG M,SIMONYAN K,VEDALDI A,et al.Deep structured output learning for unconstrained text recognition[C]//Proceedings of International Conference on Learning Representations.San Diego:IEEE,2015. DOI:10.1111/j.1365-277X.2011.01209.x
24 GAO Y,CHEN Y,WANG J,et al. Reading scene text with fully convolutional sequence modeling[J].Neurocomputing,2019,339:161-170. DOI:10.1016/j.neucom.2019.01.094
25 SHI B G,BAI X,YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304. DOI:10.1109/tpami.2016.2646371
26 CHENG Z,XU Y,BAI F,et al.AON:Towards arbitrarily-oriented text recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:5571-5579.
[1] . discussion about rationality of each type tourism buildings around west lake of hangzhou[J]. Journal of Zhejiang University (Science Edition), 1998, 25(1): 80-84.