A three-stage text recognition framework for natural scene images
ZOU Beiji1,2, YANG Wenjun1,2, LIU Shu1,2, JIANG Lingzi1,2
1.School of Computer Science and Engineering, Central South University, Changsha 410083, China 2.Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha 410083, China
Abstract:Text recognition technology plays an important role in applications such as document management,image understanding,and visual navigation.However,the appearances text in natural scenes are often of arbitrary orientation,different shape and various fonts which makes it difficult to be detected and recognized.For natural scene images with irregular texts,a three-stage text recognition framework for natural scene images is proposed,including text detection,rectification and recognition.Firstly,a feature pyramid network is used to segment the character instances,and the affinity among them is predicted by a bidirectional long short-term memory,so as to group the isolated characters into words. It is reported that the F-score of text detection is as high as 91.97%.The detected words are then rectified by a multi-object rectification network,which can deal with complicated distortion of scene text to improve its readability.Finally,an attention-based sequence recognition network outputs the predictions in sequence to achieve the word-level recognition,where the recognition accuracy is as high as 84.98%.
邹北骥, 杨文君, 刘姝, 姜灵子. 面向自然场景图像的三阶段文字识别框架[J]. 浙江大学学报(理学版), 2021, 48(1): 1-8.
ZOU Beiji, YANG Wenjun, LIU Shu, JIANG Lingzi. A three-stage text recognition framework for natural scene images. Journal of ZheJIang University(Science Edition), 2021, 48(1): 1-8.
1 邹北骥,郭建京,朱承璋,等.基于自适应色彩聚类和上下文信息的自然场景文本检测[J].电子学报,2018,46(6):1436-1444. DOI:10.3969/j.issn.0372-2112.2018. 06.024 ZOU B G,GOU J J,ZHU C Z,et al.Natural scene text detection based on adaptive color clustering and context information[J].Acta Electronica Sinica,2018,46(6):1436-1444.DOI:10.3969/j.issn.0372-2112.2018.06.024 2 HE W H,ZHANG X Y,YIN F,et al. Multi-oriented and multi-lingual scene text detection with direct regression[J].IEEE Transactions on Image Processing,2018,27(11):5406-5419. DOI:10.1109/TIP.2018.2855399 3 LIAO M H,SHI B G,BAI X,et al.TextBoxes:A fast text detector with a single deep neural network[C]// Proceedings of Thirty-First the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.San Francisco:The AAAI Press,2017:4161-4167. 4 MA J Q,SHAO W Y,YE H,et al. Arbitrary-oriented scene text detection via rotation proposals[J].IEEE Transactions on Multimedia,2018,20(11):3111-3122. DOI:10.1109/TMM.2018.2818020 5 LIAO M H,SHI B G,BAI X.TextBoxes++:A single-shot oriented scene text detector[J].IEEE Transactions on Image Processing,2018,27(8):3676-3690. DOI:10.1109/TIP.2018.2825107 6 DENG D,LIU H F,LI X L,et al.PixelLink:Detecting scene text via instance segmentation[C]//Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence. New Orleans:The AAAI Press,2018:6773-6780. 7 SHI B G,BAI X,SERGE B.Detecting oriented text in natural images by linking segments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:2550-2558. DOI:10.1109/CVPR.2017.371 8 BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:9365-9374. DOI:10.1109/cvpr.2019.00959 9 SHI B G,YANG M K,WANG X G,et al.ASTER:An attentional scene text recognizer with flexible rectification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(9):2035-2048. DOI:10.1109/TPAMI.2018.2848939 10 LUO C,JIN L,SUN Z.Moran:A multi-object rectified attention network for scene text recognition[J].Pattern Recognition,2019,90:109-118.DOI:10.1016/j.patcog.2019.01.020 11 ZHAN F,LU S.ESIR:End-to-end scene text recognition via iterative image rectification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:2059-2068. 12 LIAO M,ZHANG J,WAN Z,et al.Scene text recognition from two-dimensional perspective[C]//Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.Honolulu:The AAAI Press,2019,33:8714-8721. DOI:10.1609/aaai.v33i01.33018714 13 LI H,WANG P,SHEN C,et al.Show,attend and read:A simple and strong baseline for irregular text recognition[C]//Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.Honolulu:The AAAI Press,2019,33:8610-8617. 14 LIN T Y,DOLL?R P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:2117-2125. 15 GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5/6):602-610. DOI:10.1016/j.neunet.2005.06.042 16 HE K M,ZHANG X Y,REN S Q,et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778. DOI:10.1109/CVPR.2016.90 17 GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:2315-2324. DOI:10.1109/CVPR.2016.254 18 KARATZAS D,SHAFAIT F,UCHIDA S,et al.ICDAR 2013 robust reading competition[C]// Proceedings of 2013 12th International Conference on Document Analysis and Recognition.Washington DC:IEEE,2013:1484-1493. 19 KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.ICDAR 2015 competition on robust reading[C]//Proceedings of 2015 13th International Conference on Document Analysis and Recognition. Nancy:IEEE,2015:1156-1160. 20 KINGMA D P,BA J.ADAM:A method for stochastic optimization[C]//Proceedings of International Conference on Learning Representations. San Diego:IEEE,2015. 21 ZEILER M D.ADADELTA:An adaptive learning rate method[EB/OL].[2012-12-22].http://arXiv.org/abs/1212.5701. 22 YIN F,WU Y C,ZHANG X Y,et al.Scene text recognition with sliding convolutional character models[EB/OL].[2017-09-06].http://arXiv.org/abs/1709. 01727. 23 JADERBERG M,SIMONYAN K,VEDALDI A,et al.Deep structured output learning for unconstrained text recognition[C]//Proceedings of International Conference on Learning Representations.San Diego:IEEE,2015. DOI:10.1111/j.1365-277X.2011.01209.x 24 GAO Y,CHEN Y,WANG J,et al. Reading scene text with fully convolutional sequence modeling[J].Neurocomputing,2019,339:161-170. DOI:10.1016/j.neucom.2019.01.094 25 SHI B G,BAI X,YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304. DOI:10.1109/tpami.2016.2646371 26 CHENG Z,XU Y,BAI F,et al.AON:Towards arbitrarily-oriented text recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:5571-5579.