Journal of ZheJiang University (Engineering Science)  2020, Vol. 54 Issue (7): 1264-1271    DOI: 10.3785/j.issn.1008-973X.2020.07.003
Method with recording text classification based on deep learning
Yan-nan ZHANG1(),Xiao-hong HUANG1,Yan MA1,*(),Qun CONG2
1. Information Network Center, Beijing University of Posts and Telecommunications, Beijing 100876, China
2. Beijing Wrdtech Limited Company, Beijing 100876, China
A classification method based on deep learning was designed according to the characteristics of recording text and correlation data in order to improve the classification precision of the recording text with associated work order data. The embedding of the recording text and work order information was obtained through the bidirectional word embedding language model (ELMo). Local features of the sentence were mined by using convolutional neural networks (CNN) based on the word embedding. Title and description information of the work order were separately mined by using CNN. Features extracted by CNN were concatenated with a weighting factor. Then weighted features were entered into bidirectional gated recurrent unit (GRU) in order to capture the semantic features of the context. The attention mechanism was introduced to assign different weights to the output state of the GRU hidden layer. The experimental results show that the classification method has faster convergence rate and higher accuracy compared with the existing algorithms.

Key wordsword vector      convolutional neural networks (CNN)      bidirectional gated recurrent unit      attention      text classification     
Received: 30 July 2019      Published: 05 July 2020
CLC:  TP 391  
Corresponding Authors: Yan MA     E-mail:;
Cite this article:

Yan-nan ZHANG,Xiao-hong HUANG,Yan MA,Qun CONG. Method with recording text classification based on deep learning. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1264-1271.

URL:     OR


为了提高具有关联工单数据的录音文本的分类精确率,根据录音文本及关联数据的特点,设计基于深度学习的录音文本分类方法. 针对录音文本,通过双向词嵌入语言模型(ELMo)获得录音文本及工单信息的向量化表示,基于获取的词向量,利用卷积神经网络(CNN)挖掘句子局部特征;使用CNN分别挖掘工单标题和工单的描述信息,将CNN输出的特征进行加权拼接后,输入双向门限循环单元(GRU),捕捉句子上下文语义特征;引入注意力机制,对GRU隐藏层的输出状态赋予不同的权重. 实验结果表明,与已有算法相比,该分类方法的收敛速度快,具有更高的准确率.

关键词: 词向量,  卷积神经网络(CNN),  双向门限循环单元,  注意力,  文本分类 
Fig.1 Schematic diagram of classification model of recording text
Fig.2 ELMo model structure diagram
Fig.3 Sentence level CNN model structure
Fig.4 CNN feature weighted concatenate schematic diagram
Fig.5 GRU structure diagram
Fig.6 Attention model structure diagram
类别 训练集数 验证集数
网络故障报修 14 138 1 414
校园卡业务咨询 12 092 1 209
信息门户咨询 10 578 1 058
邮箱业务咨询 9 130 913
云盘业务咨询 8 259 826
正版软件使用 7 810 781
Tab.1 Experimental data distribution table of recording text classification method
模型参数 参数取值 参数实验值
词向量维度 200 100,200,300
CNN卷积核尺寸 3 3,4,5
CNN卷积核数量 128 64,128,256
Epoch 25 10,15,20,25,30
Batch Size 128 64,128,256
随机失活率 0.5 0.4,0.5,0.6
Tab.2 Neural network parameter value table of recorded text classification model
真实类别 模型预测为正类 模型预测为负类
正类 TP FN
负类 FP TN
Tab.3 Confusion matrix
$\gamma $ P R
0.5 0.909 7 0.840 2
0.6 0.933 8 0.873 5
0.7 0.953 2 0.905 0
0.8 0.921 5 0.852 6
0.9 0.893 2 0.822 0
Tab.4 Precision,recall and weighting factor table
模型 P R
CNN 0.734 4 0.749 5
BiLSTM 0.873 2 0.762 3
CNN+BiLSTM 0.900 1 0.873 8
BiGRU-FCN 0.914 3 0.870 2
Attention-based C-GRU 0.933 9 0.884 0
本文模型 0.953 2 0.905 0
Tab.5 Comparison experiment result statistics table of text classification method
