Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (5): 948-956    DOI: 10.3785/j.issn.1008-973X.2023.05.011
计算机技术与控制工程     
基于卷积循环神经网络的芯片表面字符识别
熊帆(),陈田*(),卞佰成,刘军
上海电机学院 机械学院,上海 201306
Chip surface character recognition based on convolutional recurrent neural network
Fan XIONG(),Tian CHEN*(),Bai-cheng BIAN,Jun LIU
School of Mechanical Engineering, Shanghai Dianji University, Shanghai 201306, China
 全文: PDF(1721 KB)   HTML
摘要:

基于积分图运算的阈值分割将图像二值化,使用仿射变换完成文本字段图像的方向校正,从而实现文本行的定位.在原始卷积循环神经网络(CRNN)的基础上,将骨干网络替换成MobileNet-V3结构,在2层LSTM之间加入注意力机制,同时引入中心损失函数.利用改进的CRNN实现文本行字符的识别.将改进后的CRNN在40 510 张芯片文本行图像上进行测试.通过小样本数据集进行模型微调训练得到多个子模型,从而实现集成推理,使用3个模型的综合识别准确率稳定在99.97%左右,单张芯片图像的总识别时间小于60 ms.实验结果表明,改进的CRNN算法的准确率比原始CRNN提升了大约27.48%,多模型集成推理的方法可以实现更高的准确率.

关键词: 图像处理积分图卷积循环神经网络字符识别集成推理    
Abstract:

A character recognition method based on an improved convolutional recurrent neural network (CRNN) was proposed for the recognition of characters on the chip surface. The image was binarized by the threshold segmentation based on integral map operation, and the orientation correction of the text field image was completed using affine transformation to achieve the localization of text lines. Based on the original CRNN, the backbone network was replaced with MobileNet-V3 structure and the attention mechanism was added between the two layers of LSTM, while the center loss function was introduced. The improved CRNN was used to implement the text line character recognition and tested on 40 510 chip text line images. The multiple sub-models were obtained by fine-tuning the model training with small sample datasets to achieve integrated inference. The combined recognition accuracy used three models was stable at about 99.97%, and the total recognition time of a single chip image was less than 60 ms. The experimental results showed that the accuracy of the improved CRNN algorithm was improved by about 27.48% over the original CRNN, and the integrated inference of multiple models could achieve higher accuracy.

Key words: image processing    integral image    convolutional recurrent neural network    character recognition    integrated inference
收稿日期: 2021-12-25 出版日期: 2023-05-09
CLC:  TP 391  
基金资助: 上海市地方院校能力建设计划项目(22010501000);上海多向模锻工程技术研究中心资助项目(20DZ2253200)
通讯作者: 陈田     E-mail: 2404440261@qq.com;chent@sdju.edu.cn
作者简介: 熊帆(1996—),男,硕士生,从事计算机视觉研究. orcid.org/ 0000-0002-4167-2255. E-mail: 2404440261@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
熊帆
陈田
卞佰成
刘军

引用本文:

熊帆,陈田,卞佰成,刘军. 基于卷积循环神经网络的芯片表面字符识别[J]. 浙江大学学报(工学版), 2023, 57(5): 948-956.

Fan XIONG,Tian CHEN,Bai-cheng BIAN,Jun LIU. Chip surface character recognition based on convolutional recurrent neural network. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 948-956.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.05.011        https://www.zjujournals.com/eng/CN/Y2023/V57/I5/948

图 1  积分图计算示例
图 2  文本行识别整体方案流程
图 3  原始CRNN结构系统
类型 描述 激活函数 卷积参数
Input W×32 ? ?
Conv Layer 1 3→32 h-swish k: (3,3)s: (2,2)p: (1,1)
Conv Block 1 32→32 relu k:(1,1)-(3,3)-(1,1)s:(1,1)-(1,1)-(1,1)p:(0,0)-(1,1)-(0,0)
Conv Block 2 32→48 relu k:(1,1)-(3,3)-(1,1)s:(1,1)-(2,2)-(1,1)p:(0,0)-(1,1)-(0,0)
Conv Block 3 48→48 relu k:(1,1)-(3,3)-(1,1)s:(1,1)-(1,1)-(1,1)p:(0,0)-(1,1)-(0,0)
Conv Block 4 48→64 h-swish k:(1,1)-(3,3)-(1,1)s:(1,1)-(2,2)-(1,1)p:(0,0)-(1,1)-(0,0)
Conv Block 5 64→96 h-swish k:(1,1)-(3,3)-(1,1)-(1,1)s:(1,1)-(1,1)-(1,1)-(1,1)p:(0,0)-(1,1)-(0,0)-(0,0)
Conv Block 6 96→128 h-swish k:(1,1)-(3,3)-(1,1)-(1,1)s:(1,1)-(1,1)-(1,1)-(1,1)p:(0,0)-(1,1)-(0,0)-(0,0)
Conv Block 7 128→256 h-swish k:(1,1)-(3,3)-(1,1)s:(1,1)-(2,1)-(1,1)p:(0,0)-(1,1)-(0,0)
Conv Block 8 256→256 h-swish k:(1,1)-(3,3)-(1,1)s:(1,1)-(1,1)-(1,1)p:(0,0)-(1,1)-(0,0)
Conv Layer 2 256→512 h-swish k:(2,2) s:(1,1)p:(0,0)
Output 512×1×40 ? ?
表 1  改进后的CNN模块结构
图 4  swish和h-swish激活函数曲线特性
图 5  改进后的LSTM模块结构
图 6  多模型集成推理
图 7  不同算法的阈值分割效果对比
图 8  文本字段区域的方向校正
图 9  基础数据集图像
IC文本行图像 ACC/%
初次训练 二次训练 多模型综合
99.946 99.872 99.981
99.965 99.958 99.973
99.876 99.953 99.963
99.891 99.968 99.976
99.936 99.963 99.973
99.662 97.759 99.847
表 2  集成推理准确率测试结果
模型形态 A/% T/ms
原始CRNN 67.353 25.04
CNN改进后 74.068 14.58
LSTM改进后 78.474 19.14
损失函数改进后 90.751 23.90
综合改进后 94.831 11.85
表 3  CRNN各项改进的对比测试结果
1 王珂, 杨芳, 姜杉 光学字符识别综述[J]. 计算机应用研究, 2020, 37 (Suppl.2): 22- 24
WANG Ke, YANG Fang, JIANG Shan Overview of optical character recognition[J]. Application Research of Computers, 2020, 37 (Suppl.2): 22- 24
2 陈景柱, 鲍玉斌 图像处理中基于改进YOLO的ROI提取算法研究[J]. 数学的实践与认识, 2020, 50 (22): 179- 185
CHEN Jing-zhu, BAO Yu-bin ROI extraction algorithm based on improved YOLO in image processing[J]. Journal of Mathematics in Practice and Theory, 2020, 50 (22): 179- 185
3 郭晓峰, 王耀南, 毛建旭 基于几何特征的IC芯片字符分割与识别方法[J]. 智能系统学报, 2020, 15 (1): 144- 151
GUO Xiao-feng, WANG Yao-nan, MAO Jian-xu IC chip character segmentation and recognition method based on geometric features[J]. CAAI Transactions on Intelligent Systems, 2020, 15 (1): 144- 151
doi: 10.11992/tis.201904028
4 姚文凤, 甄彤, 吕宗旺, 等 车牌字符分割与识别技术研究[J]. 现代电子技术, 2020, 43 (19): 65- 69
YAO Wen-feng, ZHEN Tong, LV Zong-wang, et al Research on technology of segmentation and recognition of license plate character[J]. Modern Electronics Technique, 2020, 43 (19): 65- 69
doi: 10.16652/j.issn.1004-373x.2020.19.016
5 马欣欣, 李小平 集装箱箱号字符识别关键技术的研究[J]. 现代电子技术, 2019, 42 (14): 131- 134
MA Xin-xin, LI Xiao-ping Research on key technologies for character recognition of container numbers[J]. Modern Electronics Technique, 2019, 42 (14): 131- 134
doi: 10.16652/j.issn.1004-373x.2019.14.030
6 白睿, 徐友春, 李永乐, 等 智能车道路场景数字字符识别技术[J]. 激光与光电子学进展, 2021, 57 (15): 178- 185
BAI Rui, XU You-chun, LI Yong-le, et al Digital character recognition technique for intelligent vehicles in road scenes[J]. Laser and Optoelectronics Progress, 2021, 57 (15): 178- 185
7 祁忠琪, 涂凯, 吴书楷, 等 基于深度学习的含堆叠字符的车牌识别算法[J]. 计算机应用研究, 2021, 38 (5): 1550- 1554
QI Zhong-qi, TU Kai, WU Shu-kai, et al Recognizing license plate with stacked characters based on deep learning[J]. Application Research of Computers, 2021, 38 (5): 1550- 1554
doi: 10.19734/j.issn.1001-3695.2020.04.0147
8 VISHNUVARDHAN A, SRIHARSHA M N An overview of text detection in natural scene images[J]. International Journal of Innovative Technology and Exploring Engineering, 2019, 8 (7c2): 384- 387
9 ZHAI W, GAO T, FENG J Research on pre-processing methods for license plate recognition[J]. International Journal of Computer Vision and Image Processing, 2021, 11 (1): 47- 79
doi: 10.4018/IJCVIP.2021010104
10 LAROCA R, ZANLORENSI L A, GONÇALVES G R, et al An efficient and layout-independent automatic license plate recognition system based on the YOLO detector[J]. IET Intelligent Transport Systems, 2021, 15 (4): 483- 503
doi: 10.1049/itr2.12030
11 CHEN Z, YAN L, YIN S, et al Vehicle license plate recognition system based on deep learning in natural scene[J]. Journal of Artificial Intelligence, 2020, 2 (4): 167
doi: 10.32604/jai.2020.012716
12 NAIEMI F, GHODS V, KHALESI H A novel pipeline framework for multi oriented scene text image detection and recognition[J]. Expert Systems with Applications, 2021, 170: 114549
doi: 10.1016/j.eswa.2020.114549
13 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
14 殷航. 基于YOLO的自然场景中文文本检测算法研究[D]. 武汉: 武汉科技大学, 2020: 17.
YIN Hang. Research on Chinese text detection algorithm for natural scenes based on YOLO [D]. Wuhan: Wuhan University of Science and Technology, 2020: 17.
15 REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. [2019-02-25]. https://arxiv.org/abs/1804.02767v1.
16 傅勇, 潘晴, 田妮莉, 等 改进级联卷积神经网络的平面旋转人脸检测[J]. 计算机工程与设计, 2020, 41 (3): 856- 861
FU Yong, PAN Qing, TIAN Ni-li, et al Face detection of rotation in plane based on improved cascade CNN[J]. Computer Engineering and Applications, 2020, 41 (3): 856- 861
doi: 10.16208/j.issn1000-7024.2020.03.041
17 SHI B G, BAI X, YAO C An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39 (11): 2298- 2304
18 MERZBAN M H, MAHMOUD E Efficient solution of Otsu multilevel image thresholding: a comparative study[J]. Expert Systems with Applications, 2019, 116: 299- 309
doi: 10.1016/j.eswa.2018.09.008
19 BRADLEY D, ROTH G Adaptive thresholding using the integral image[J]. Journal of Graphics Tools, 2007, 12 (2): 13- 21
doi: 10.1080/2151237X.2007.10129236
20 SHI X, CHEN Z, WANG H, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting [C]// Advances in Neural Information Processing Systems. Montreal: NIPS, 2015: 802-810.
21 SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C]// International Conference on Learning Representations. San Diego: ICLR, 2015: 1-14.
22 HOWARD A, SANGLER M, CHU G, et al. Searching for mobilenetv3 [C]// Proceedings of the IEEE/CVF Intern-ational Conference on Computer Vision. California: IEEE/CVF, 2019: 1314-1324.
23 BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [C]// International Conference on Learning Representations. San Diego: ICLR, 2015: 1-15. .
[1] 温佩芝,陈君谋,肖雁南,温雅媛,黄文明. 基于生成式对抗网络和多级小波包卷积网络的水下图像增强算法[J]. 浙江大学学报(工学版), 2022, 56(2): 213-224.
[2] 蒋昊,徐海松. 基于直方图与图像分块融合的阶调映射算法[J]. 浙江大学学报(工学版), 2022, 56(11): 2224-2231.
[3] 陈彤,郭剑锋,韩心中,谢学立,席建祥. 基于生成对抗模型的可见光-红外图像匹配方法[J]. 浙江大学学报(工学版), 2022, 56(1): 63-74.
[4] 徐铸业,赵小强,蒋红梅. 基于点分布模型的3D模型拟合方法[J]. 浙江大学学报(工学版), 2021, 55(12): 2373-2381.
[5] 李瑛,成芳,赵志林. 采用结构光的大跨度销孔加工精度在线测量[J]. 浙江大学学报(工学版), 2020, 54(3): 557-565.
[6] 王万良,杨小涵,赵燕伟,高楠,吕闯,张兆娟. 采用卷积自编码器网络的图像增强算法[J]. 浙江大学学报(工学版), 2019, 53(9): 1728-1740.
[7] 周昊, 李宁, 李源, 赵梦豪, 岑可法. 富氧条件下乙醇喷雾燃烧特性的实验研究[J]. 浙江大学学报(工学版), 2018, 52(9): 1821-1827.
[8] 张承志, 冯华君, 徐之海, 李奇, 陈跃庭. 图像噪声方差分段估计法[J]. 浙江大学学报(工学版), 2018, 52(9): 1804-1810.
[9] 周佳立, 陈以军, 武敏. 基于FPGA监听的图像采集与预处理方法[J]. 浙江大学学报(工学版), 2018, 52(2): 398-405.
[10] 黄松, 易本顺. 基于自适应透射率比的水下图像复原算法[J]. 浙江大学学报(工学版), 2018, 52(1): 166-173.
[11] 陈德, 韩森, 苏谦, 韩霄. 基于抗滑降噪性能的沥青路面表面构造评价指标[J]. 浙江大学学报(工学版), 2017, 51(5): 896-903.
[12] 周昊, 马炜晨, 杨玉, 陈建中. 低氮煤粉旋流燃烧器火焰特性的研究[J]. 浙江大学学报(工学版), 2016, 50(4): 698-703.
[13] 吴一全,殷骏,朱丽,袁永明. 基于蜂群优化或分解的二维Arimoto灰度熵阈值分割[J]. 浙江大学学报(工学版), 2015, 49(9): 1625-1633.
[14] 王媛媛, 郭延恩, 施国全, 韦俊霞, 夏顺仁. 基于声图像的海底地形边界提取算法[J]. 浙江大学学报(工学版), 2015, 49(2): 376-383.
[15] 王涵,夏新星,于超,钟擎,高茜珏,李海峰,刘旭. 集成光场三维显示亮度均匀性校正方法[J]. 浙江大学学报(工学版), 2015, 49(1): 1-5.