面向自然场景图像的三阶段文字识别框架

doi:10.3785/j.issn.1008-9497.2021.01.001

浙江大学学报（理学版）

2021, Vol. 48

Issue (1): 1-8 DOI: 10.3785/j.issn.1008-9497.2021.01.001

图像理解与数据分析

面向自然场景图像的三阶段文字识别框架

邹北骥^1,2, 杨文君^1,2, 刘姝^1,2, 姜灵子^1,2

1.中南大学计算机学院,湖南长沙 410083
2.湖南省机器视觉与智慧医疗工程技术研究中心,湖南长沙 410083

A three-stage text recognition framework for natural scene images

ZOU Beiji^1,2, YANG Wenjun^1,2, LIU Shu^1,2, JIANG Lingzi^1,2

1.School of Computer Science and Engineering, Central South University, Changsha 410083, China
2.Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha 410083, China

全文: PDF(5690 KB)

HTML

摘要： 文字识别技术在文档管理、图像理解、视觉导航等中具有重要应用。然而，自然场景中的文字通常排列任意、形状不一、字体多样，难以被检测和识别。提出了面向自然场景图像的三阶段文字识别框架，该框架包括文字检测、文字矫正和文字识别。首先，利用特征金字塔网络分割图像中的字符，基于双向长短期记忆网络获取字符间的亲和度，连接孤立字符构建单词行，文字检测率（F分数）高达91.97%。然后，通过多目标矫正网络矫正被检测文字，以应对场景图像文字的复杂形变，增强阅读性。最后，通过注意力序列识别网络按序输出预测结果，实现单词级识别，文字识别正确率达84.98%。

关键词： 文字识别; 自然场景; 文字检测; 文字矫正

Abstract: Text recognition technology plays an important role in applications such as document management,image understanding,and visual navigation.However,the appearances text in natural scenes are often of arbitrary orientation,different shape and various fonts which makes it difficult to be detected and recognized.For natural scene images with irregular texts,a three-stage text recognition framework for natural scene images is proposed,including text detection,rectification and recognition.Firstly,a feature pyramid network is used to segment the character instances,and the affinity among them is predicted by a bidirectional long short-term memory,so as to group the isolated characters into words. It is reported that the F-score of text detection is as high as 91.97%.The detected words are then rectified by a multi-object rectification network,which can deal with complicated distortion of scene text to improve its readability.Finally,an attention-based sequence recognition network outputs the predictions in sequence to achieve the word-level recognition,where the recognition accuracy is as high as 84.98%.

Key words: text detection text rectification natural scene text recognition

收稿日期: 2020-09-23 出版日期: 2021-01-20

CLC:

TP 391.41

基金资助: 国家自然科学基金资助项目(61902435)；科技部重大项目(2018AAA0102102)；湖南省科技计划项目(2017WK2074);教育部学科创新引智基地项目(B18059)；湖南省自然科学基金资助项目(2019JJ50808)；2020年大学生创新创业训练计划支持项目（GCX2020325Y）.

通讯作者: ORCID:http://orcid.org/0000-0003-0797-5807，E-mail:sliu35@csu.edu.cn. E-mail: sliu35@csu.edu.cn

作者简介: 邹北骥(1961—),ORCID:http://orcid.org/0000-0002-3542-1097，男,博士,教授,主要从事计算机视觉、图像处理研；

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	邹北骥
	杨文君
	刘姝
	姜灵子

引用本文:

邹北骥, 杨文君, 刘姝, 姜灵子. 面向自然场景图像的三阶段文字识别框架[J]. 浙江大学学报（理学版）, 2021, 48(1): 1-8.

ZOU Beiji, YANG Wenjun, LIU Shu, JIANG Lingzi. A three-stage text recognition framework for natural scene images. Journal of Zhejiang University (Science Edition), 2021, 48(1): 1-8.

链接本文:

https://www.zjujournals.com/sci/CN/10.3785/j.issn.1008-9497.2021.01.001 或 https://www.zjujournals.com/sci/CN/Y2021/V48/I1/1

1 邹北骥，郭建京，朱承璋，等.基于自适应色彩聚类和上下文信息的自然场景文本检测［J］.电子学报，2018，46（6）：1436-1444. DOI：10.3969/j.issn.0372-2112.2018. 06.024 ZOU B G，GOU J J，ZHU C Z，et al.Natural scene text detection based on adaptive color clustering and context information［J］.Acta Electronica Sinica，2018，46（6）：1436-1444.DOI：10.3969/j.issn.0372-2112.2018.06.024
2 HE W H，ZHANG X Y，YIN F，et al. Multi-oriented and multi-lingual scene text detection with direct regression［J］.IEEE Transactions on Image Processing，2018，27（11）：5406-5419. DOI：10.1109/TIP.2018.2855399
3 LIAO M H，SHI B G，BAI X，et al.TextBoxes：A fast text detector with a single deep neural network［C］// Proceedings of Thirty-First the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.San Francisco：The AAAI Press，2017：4161-4167.
4 MA J Q，SHAO W Y，YE H，et al. Arbitrary-oriented scene text detection via rotation proposals［J］.IEEE Transactions on Multimedia，2018，20（11）：3111-3122. DOI：10.1109/TMM.2018.2818020
5 LIAO M H，SHI B G，BAI X.TextBoxes++：A single-shot oriented scene text detector［J］.IEEE Transactions on Image Processing，2018，27（8）：3676-3690. DOI：10.1109/TIP.2018.2825107
6 DENG D，LIU H F，LI X L，et al.PixelLink：Detecting scene text via instance segmentation［C］//Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence. New Orleans：The AAAI Press，2018：6773-6780.
7 SHI B G，BAI X，SERGE B.Detecting oriented text in natural images by linking segments［C］//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu：IEEE，2017：2550-2558. DOI：10.1109/CVPR.2017.371
8 BAEK Y，LEE B，HAN D，et al.Character region awareness for text detection［C］//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach，CA：IEEE，2019：9365-9374. DOI：10.1109/cvpr.2019.00959
9 SHI B G，YANG M K，WANG X G，et al.ASTER：An attentional scene text recognizer with flexible rectification［J］.IEEE Transactions on Pattern Analysis and Machine Intelligence，2018，41（9）：2035-2048. DOI：10.1109/TPAMI.2018.2848939
10 LUO C，JIN L，SUN Z.Moran：A multi-object rectified attention network for scene text recognition［J］.Pattern Recognition，2019，90：109-118.DOI：10.1016/j.patcog.2019.01.020
11 ZHAN F，LU S.ESIR：End-to-end scene text recognition via iterative image rectification［C］//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach，CA：IEEE，2019：2059-2068.
12 LIAO M，ZHANG J，WAN Z，et al.Scene text recognition from two-dimensional perspective［C］//Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.Honolulu：The AAAI Press，2019，33：8714-8721. DOI：10.1609/aaai.v33i01.33018714
13 LI H，WANG P，SHEN C，et al.Show，attend and read：A simple and strong baseline for irregular text recognition［C］//Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.Honolulu：The AAAI Press，2019，33：8610-8617.
14 LIN T Y，DOLLÁR P，GIRSHICK R，et al.Feature pyramid networks for object detection［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu：IEEE，2017：2117-2125.
15 GRAVES A，SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures［J］.Neural Networks，2005，18（5/6）：602-610. DOI：10.1016/j.neunet.2005.06.042
16 HE K M，ZHANG X Y，REN S Q，et al. Deep residual learning for image recognition［C］//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas：IEEE，2016：770-778. DOI：10.1109/CVPR.2016.90
17 GUPTA A，VEDALDI A，ZISSERMAN A.Synthetic data for text localisation in natural images［C］//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas：IEEE，2016：2315-2324. DOI：10.1109/CVPR.2016.254
18 KARATZAS D，SHAFAIT F，UCHIDA S，et al.ICDAR 2013 robust reading competition［C］// Proceedings of 2013 12th International Conference on Document Analysis and Recognition.Washington DC：IEEE，2013：1484-1493.
19 KARATZAS D，GOMEZ-BIGORDA L，NICOLAOU A，et al.ICDAR 2015 competition on robust reading［C］//Proceedings of 2015 13th International Conference on Document Analysis and Recognition. Nancy：IEEE，2015：1156-1160.
20 KINGMA D P，BA J.ADAM：A method for stochastic optimization［C］//Proceedings of International Conference on Learning Representations. San Diego：IEEE，2015.
21 ZEILER M D.ADADELTA：An adaptive learning rate method［EB/OL］.［2012-12-22］.http：//arXiv.org/abs/1212.5701.
22 YIN F，WU Y C，ZHANG X Y，et al.Scene text recognition with sliding convolutional character models［EB/OL］.［2017-09-06］.http：//arXiv.org/abs/1709. 01727.
23 JADERBERG M，SIMONYAN K，VEDALDI A，et al.Deep structured output learning for unconstrained text recognition［C］//Proceedings of International Conference on Learning Representations.San Diego：IEEE，2015. DOI：10.1111/j.1365-277X.2011.01209.x
24 GAO Y，CHEN Y，WANG J，et al. Reading scene text with fully convolutional sequence modeling［J］.Neurocomputing，2019，339：161-170. DOI：10.1016/j.neucom.2019.01.094
25 SHI B G，BAI X，YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition［J］.IEEE Transactions on Pattern Analysis and Machine Intelligence，2016，39（11）：2298-2304. DOI：10.1109/tpami.2016.2646371
26 CHENG Z，XU Y，BAI F，et al.AON：Towards arbitrarily-oriented text recognition［C］//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City：IEEE，2018：5571-5579.

[1]	方于华,叶枫. MFDC-Net：一种融合多尺度特征和注意力机制的乳腺癌病理图像分类算法[J]. 浙江大学学报（理学版）, 2023, 50(4): 455-464.
[2]	虞瑞麒,刘玉华,沈禧龙,翟如钰,张翔,周志光. 表征学习驱动的多重网络图采样[J]. 浙江大学学报（理学版）, 2022, 49(3): 271-279.
[3]	祝锦泰,叶继华,郭凤,江蕗,江爱文. FSAGN：一种自主选择关键帧的表情识别方法[J]. 浙江大学学报（理学版）, 2022, 49(2): 141-150.
[4]	钟颖,王松,吴浩,程泽鹏,李学俊. 基于SEMMA的网络安全事件可视探索[J]. 浙江大学学报（理学版）, 2022, 49(2): 131-140.
[5]	朱强,王超毅,张吉庆,尹宝才,魏小鹏,杨鑫. 基于事件相机的无人机目标跟踪算法[J]. 浙江大学学报（理学版）, 2022, 49(1): 10-18.
[6]	杨猛,丁曙,马云涛,谢佳翊,段瑞枫. 基于纹理特征的小麦锈病动态模拟方法[J]. 浙江大学学报（理学版）, 2022, 49(1): 1-9.
[7]	余鹏, 刘兰, 蔡韵, 何煜, 张松海. 基于单目摄像头的自主健身监测系统[J]. 浙江大学学报（理学版）, 2021, 48(5): 521-530.
[8]	傅汝佳, 冼楚华, 李桂清, 万隽杰, 曹铖, 杨存义, 高月芳. 面向表型精确鉴定的豆株快速三维重建[J]. 浙江大学学报（理学版）, 2021, 48(5): 531-539.
[9]	徐敏, 王科, 戴浩然, 罗晓博, 余炜伦, 陶煜波, 林海. 基于电子病历的乳腺癌群组与治疗方案可视分析[J]. 浙江大学学报（理学版）, 2021, 48(4): 391-401.
[10]	桂志强, 姚裕友, 张高峰, 徐本柱, 郑利平. 3D-power图的快速生成方法[J]. 浙江大学学报（理学版）, 2021, 48(4): 410-417.
[11]	陈园琼, 邹北骥, 张美华, 廖望旻, 黄嘉儿, 朱承璋. 医学影像处理的深度学习可解释性研究进展[J]. 浙江大学学报（理学版）, 2021, 48(1): 18-29.
[12]	邓惠俊. 排序支持的交互数据分类算法及其应用[J]. 浙江大学学报（理学版）, 2021, 48(1): 9-17.
[13]	李华飙, 侯小刚, 王婷婷, 赵海英. 基于规则学习的传统纹样统一生成模式研究[J]. 浙江大学学报（理学版）, 2020, 47(6): 669-676.
[14]	檀结庆, 曹宁宁. 一种四边形网格上的Midedge细分格式[J]. 浙江大学学报（理学版）, 2019, 46(2): 154-163.

Viewed

Full text

Abstract

Cited

Shared

Discussed