Method with recording text classification based on deep learning

doi:10.3785/j.issn.1008-973X.2020.07.003

Journal of ZheJiang University (Engineering Science)

2020, Vol. 54

Issue (7): 1264-1271 DOI: 10.3785/j.issn.1008-973X.2020.07.003

Method with recording text classification based on deep learning

Yan-nan ZHANG1(

),Xiao-hong HUANG1,Yan MA1,*(

),Qun CONG2

1. Information Network Center, Beijing University of Posts and Telecommunications, Beijing 100876, China
2. Beijing Wrdtech Limited Company, Beijing 100876, China

Download:

HTML

PDF(1048KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A classification method based on deep learning was designed according to the characteristics of recording text and correlation data in order to improve the classification precision of the recording text with associated work order data. The embedding of the recording text and work order information was obtained through the bidirectional word embedding language model (ELMo). Local features of the sentence were mined by using convolutional neural networks (CNN) based on the word embedding. Title and description information of the work order were separately mined by using CNN. Features extracted by CNN were concatenated with a weighting factor. Then weighted features were entered into bidirectional gated recurrent unit (GRU) in order to capture the semantic features of the context. The attention mechanism was introduced to assign different weights to the output state of the GRU hidden layer. The experimental results show that the classification method has faster convergence rate and higher accuracy compared with the existing algorithms.

Key words： word vector convolutional neural networks (CNN) bidirectional gated recurrent unit attention text classification

Received: 30 July 2019 Published: 05 July 2020

CLC:

TP 391

Corresponding Authors: Yan MA E-mail: knightzyn@163.com;mayan@bupt.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Yan-nan ZHANG
	Xiao-hong HUANG
	Yan MA
	Qun CONG

Cite this article:

Yan-nan ZHANG,Xiao-hong HUANG,Yan MA,Qun CONG. Method with recording text classification based on deep learning. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1264-1271.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.07.003 OR http://www.zjujournals.com/eng/Y2020/V54/I7/1264

基于深度学习的录音文本分类方法

为了提高具有关联工单数据的录音文本的分类精确率，根据录音文本及关联数据的特点，设计基于深度学习的录音文本分类方法. 针对录音文本，通过双向词嵌入语言模型（ELMo）获得录音文本及工单信息的向量化表示，基于获取的词向量，利用卷积神经网络（CNN）挖掘句子局部特征；使用CNN分别挖掘工单标题和工单的描述信息，将CNN输出的特征进行加权拼接后，输入双向门限循环单元（GRU），捕捉句子上下文语义特征；引入注意力机制，对GRU隐藏层的输出状态赋予不同的权重. 实验结果表明，与已有算法相比，该分类方法的收敛速度快，具有更高的准确率.

关键词： 词向量, 卷积神经网络（CNN）, 双向门限循环单元, 注意力, 文本分类

Fig.1 Schematic diagram of classification model of recording text

Fig.2 ELMo model structure diagram

Fig.3 Sentence level CNN model structure

Fig.4 CNN feature weighted concatenate schematic diagram

Fig.5 GRU structure diagram

Fig.6 Attention model structure diagram

Tab.1 Experimental data distribution table of recording text classification method

Tab.2 Neural network parameter value table of recorded text classification model

Tab.3 Confusion matrix

Tab.4 Precision，recall and weighting factor table

Tab.5 Comparison experiment result statistics table of text classification method


[1]	GAO J, GALLEY M, LI L Neural approaches to conversational AI[J]. Foundations and Trends? in Information Retrieval, 2019, 13 (2/3): 127- 298

[2]	ZHOU Y, LI C, HE S, et al. Pre-trained contextualized representation for Chinese conversation topic classification [C] // IEEE International Conference on Intelligence and Security Informatics. Shenzhen: IEEE, 2019: 122-127.

[3]	SUN B, TIAN F, LIANG L. Tibetan micro-blog sentiment analysis based on mixed deep learning [C] // International Conference on Audio, Language and Image Processing. Shanghai: ICALIP, 2018: 109-112.

[4]	龚媛. 基于自然语言处理的语音识别后文本处理[D]. 北京: 北京邮电大学, 2008. GONG Yuan. Text correction for ASR result on the platform of intelligent mobile phone [D]. Beijing: Beijing University of Posts and Telecommunications, 2008.

[5]	刘艺彬. 基于分词频的特征选择算法在文本分类中的研究[D]. 西安: 西安理工大学, 2018. LIU Yi-bin. Research on feature selection algorithm based on segmented term frequency in text classification [D]. Xi’an: Xi’an University of Technology, 2018.

[6]	EZZAT S, EL GAYAR N, GHANEM M M Sentiment analysis of call centre audio conversations using text classification[J]. International Journal of Computer Information Systems and Industrial Management Applications, 2012, 4 (1): 619- 627

[7]	宋鲜艳. 基于循环神经网络的口语语义理解研究[D]. 武汉: 华中科技大学, 2018. SONG Xian-yan. A thesis submitted in partial fulfillment of the requirements for the degree for the master of engineering [D]. Wuhan: Huazhong University of Science and Technology, 2018.

[8]	MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C] // Advances in Neural Information Processing Systems. Nevada: NIPS, 2013: 3111-3119.

[9]	MATTHEW E P, MARK N, MOHIT I, et al. Deep contextualized word representations [C] // Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans: ACL, 2018: 2227-2237.

[10]	KIM Y. Convolutional neural networks for sentence classification [C] // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2014: 1746-1751.

[11]	LIU P, QIU X, HUANG X, et al. Recurrent neural network for text classification with multi-task learning [C] // Proceedings of the 25th International Joint Conferences on Artificial Intelligence. New York: AAAI Press, 2016: 2873-2879.

[12]	ATHIWARATKUN B, STOKES J W. Malware classification with LSTM and GRU language models and a character-level CNN [C] // 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans: IEEE, 2017: 2482-2486.

[13]	LIANG X, LIU Z, OUYANG C. A multi-sentiment classifier based on GRU and attention mechanism [C] // 2018 IEEE 9th International Conference on Software Engineering and Service Science. Beijing: IEEE, 2018: 527-530.

[14]	LYU L, HAN T. A comparative study of Chinese patent literature automatic classification based on deep learning [C] // 2019 ACM/IEEE Joint Conference on Digital Libraries. Champaign: IEEE, 2019: 345-346.

[15]	WANG B. Disconnected recurrent neural networks for text categorization [C] // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: ACL, 2018: 2311-2320.

[16]	哈工大停用词表[EB/OL]. [2019-12-18]. https://github.com/goto456/stopwords.

[17]	任勉, 甘刚基于双向LSTM模型的文本情感分类[J]. 计算机工程与设计, 2018, 39 (7): 2064- 2068 REN Mian, GAN Gang Sentiment analysis of text based on bi-directional long short-term memory model[J]. Computer Engineering and Design, 2018, 39 (7): 2064- 2068

[18]	TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification [C] // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: ACL, 2015: 1422-1432.

[19]	张国豪, 刘波采用CNN和Bidirectional GRU的时间序列分类研究[J]. 计算机科学与探索, 2019, 13 (6): 916- 927 ZHANG Guo-hao, LIU Bo Research on time series classification using CNN and bidirectional GRU[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13 (6): 916- 927 doi: 10.3778/j.issn.1673-9418.1812059

[20]	杨东, 王移芝基于Attention-based C-GRU神经网络的文本分类[J]. 计算机与现代化, 2018, 34 (2): 96- 100 YANG Dong, WANG Yi-zhi An Attention-based C-GRU neural network for text classification[J]. Computer and Modernization, 2018, 34 (2): 96- 100 doi: 10.3969/j.issn.1006-2475.2018.02.020

[1]	Peng SONG,De-dong YANG,Chang LI,Chang GUO. An adaptive siamese network tracking algorithm based on global feature channel recognition[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(5): 966-975.

[2]	Zi-ye YONG,Ji-chang GUO,Chong-yi LI. weakly supervised underwater image enhancement algorithm incorporating attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(3): 555-562.

[3]	Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 395-401.

[4]	Chuang LIU,Jun LIANG. Vehicle motion trajectory prediction based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1156-1163.

[5]	Yan ZHANG,Bin GUO,Qian-ru WANG,Jing ZHANG,Zhi-wen YU. SeqRec: sequential-based recommendation model with long-term preference and instant interest[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1177-1184.

[6]	Yun-qing HU,Qing-ying QIU,Xiu YU,Jian-wei WU. Semi-supervised patent text classification method based on improved Tri-training algorithm[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(2): 331-339.

[7]	Dong LIANG,Xin-yu LIU,Jia-xing PAN,Han SUN,Wen-jun ZHOU,Shun’ichi KANEKO. Foreground segmentation under dynamic background based on self-updating co-occurrence pixel[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(12): 2405-2413.

[8]	Xiao-hu ZHAO,Liang-fei YIN,Cheng-long ZHAO. Image captioning based on global-local feature and adaptive-attention[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(1): 126-134.

[9]	Yue DONG,Hua-jun FENG,Zhi-hai XU,Yue-ting CHEN,Qi LI. Attention Res-Unet: an efficient shadow detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(2): 373-381.

[10]	GUO Bao-zhen, ZUO Wan-li, WANG Ying. Double CNN sentence classification model with attention mechanism of word embeddings[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(9): 1729-1737.

[11]	WEI Chao, LUO Sen-lin, ZHANG Jing, PAN Li-min. Short text manifold representation based on AutoEncoder network[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(8): 1591-1599.

[12]	WANG Meng, LIN Lan-fen, WANG Feng. Short text expansion and classification based on pseudo-relevance feedback [J]. Journal of ZheJiang University (Engineering Science), 2014, 48(5): 2-.

[13]	LIU Zhong, CHEN Wei-hai, WU Xing-ming, ZOU Yu-hua, WANG Jian-hua. Salient region detection based on stereo vision[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(2): 354-359.

[14]	WANG Meng, LIN Lan-fen, WANG Feng. Short text expansion and classification based on pseudo-relevance feedback [J]. Journal of ZheJiang University (Engineering Science), 2014, 48(10): 1835-1842.

Viewed

Full text

Abstract

Cited

Shared

Discussed