Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)  2018, Vol. 52 Issue (9): 1729-1737    DOI: 10.3785/j.issn.1008-973X.2018.09.013
Computer Technology     
Double CNN sentence classification model with attention mechanism of word embeddings
GUO Bao-zhen, ZUO Wan-li, WANG Ying
College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun 130012, China
Download:   PDF(883KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A novel sentence classification model was proposed based on double convolutional neural networks with attention mechanism of word embeddings (AT-DouCNN) in view of the points that different words have different influences to the results of classification and the word embedding of each word is restricted by a single training tool. The proposed model combined the convolutional neural networks with attention mechanism. Meanwhile, this model took the word embeddings obtained by different training algorithms as input, performed convolution and pooling respectively, and fused them in the full connection layer. Based on these, the model not only makes the key information in a sentence more easily extracted under a specific classification task, but also gets more abundant sentence features with the effective use of different kinds of word embeddings, so as to improve the accuracy of classification. The experimental results demonstrate that the proposed model achieves competitive performance in sentence classification and the accuracy is 50.6%, 88.6% and 95.4% on three public datasets, respectively.



Received: 27 December 2017      Published: 20 September 2018
CLC:  TP183  
Cite this article:

GUO Bao-zhen, ZUO Wan-li, WANG Ying. Double CNN sentence classification model with attention mechanism of word embeddings. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2018, 52(9): 1729-1737.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2018.09.013     OR     http://www.zjujournals.com/eng/Y2018/V52/I9/1729


采用词向量注意力机制的双路卷积神经网络句子分类模型

针对句子中不同的词对分类结果影响不同以及每个词对应的词向量受限于单一词向量训练模型的特点,提出一种基于词向量注意力机制的双路卷积神经网络句子分类模型(AT-DouCNN).该模型将注意力机制和卷积神经网络相结合,以不同训练算法得到的词向量同时作为输入,分别进行卷积和池化,并在全连接层进行融合,不仅能够使得具体分类任务下句子中的关键信息更易被提取,还能够有效地利用不同种类的词向量得到更加丰富的句子特征,进而提高分类的准确率.实验结果表明:所提出的模型在3个公开数据集上的分类准确率分别达到50.6%、88.6%和95.4%,具有良好的句子分类效果.

[1] WANG X L, SHRIVASTAVA A, GUPTA A. A-Fast-RCNN:hard positive generation via adversary for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:3039-3048.
[2] BELINKOV Y, GLASS J. Analyzing hidden representations in end-to-end automatic speech recognition systems[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2017:2438-2448.
[3] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 27th Annual Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2013:3111-3119.
[4] HINTON G. Learning distributed representations of concepts[C]//Proceedings of the 8th Annual Conference of the Cognitive Science Society. London:Psychology Press, 1986:1-12.
[5] PENG Y T, JIANG H. Leverage financial news to predict stock price movements using word embeddings and deep neural networks[C]//Proceedings of the 2016 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg:NAACL, 2016:374-379.
[6] DAHOU A, XIONG S W, ZHOU J W, et al. Word embeddings and convolutional neural network for arabic sentiment classification[C]//Proceedings of the 26th International Conference on Computational Linguistics. New York:ACM, 2016:2418-2427.
[7] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2014:1746-1751.
[8] TURNEY P D. Thumbs up or thumbs down?:semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg:ACL, 2002:417-424.
[9] YU H, HATZIVASSILOGLOU V. Towards answering opinion questions:separating facts from opinions and identifying the polarity of opinion sentences[C]//Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2003:129-136.
[10] WIEBE J, RILOFF E. Creating subjective and objective sentence classifiers from unannotated texts[C]//Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing. Berlin:Spring-Verlag, 2005:486-497.
[11] NAKAGAWA K, INUI K, KUROHASHI S. Dependency tree-based sentiment classification using CRFs with hidden variables[C]//Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg:NAACL, 2010:786-794.
[12] SILVA J, COHEUR L, MENDES A. From symbolic to sub-symbolic information in question classification[J]. Artificial Intelligence Review, 2011, 35(2):137-154.
[13] WANG S D, MANNING C. Baselines and bigrams:simple, good sentiment and topic classification[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2012:90-94.
[14] SOCHER R, PERELYGIN A, WU J, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2013:1631-1642.
[15] KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences[C]//Proceedings of the 52nd Annual Meeting on Association for Computational Linguistics. Stroudsburg:ACL, 2014:655-665.
[16] ZHOU C T, SUN C L, LIU Z Y, et al. A C-LSTM neural network for text classification[EB/OL].[2017-11-10]. http://arxiv.org/pdf/1511.08630.pdf.
[17] LUONG M, PHAM H, MANNING C. Effective approaches to attention-based neural machine translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2015:1412-1421.
[18] 梁斌, 刘全, 徐进, 等. 基于多注意力卷积神经网络的特定目标情感分析[J]. 计算机研究与发展, 2017, 54(8):1724-1735 LIANG Bin, LIU Quan, XU Jin, et al. Aspect-based sentiment analysis based on multi-attention CNN[J]. Journal of Computer Research and Development, 2017, 54(8):1724-1735
[19] YANG Z C, YANG D Y, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg:NAACL, 2016:1480-1489.
[20] ZHAO Z W, WU Y Z. Attention-based convolutional neural networks for sentence classification[C]//Proceedings of the 2016 Annual Conference of the International Speech Communication Association. Baixas:ISCA, 2016:705-709.
[21] PENNINGTON J, SOCHER R, MANNING C. Glove:glove vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2014:1532-1543.
[22] ZEILER M. Adadelta:an adaptive learning rate method[EB/OL].[2017-11-10]. https://arxiv.org/pdf/1212.5701.pdf.
[23] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Tool for computing continuous distributed representations of words[EB/OL].[2018-04-08]. http://code.google.com/p/word2vec/.
[24] PENNINGTON J, SOCHER R, MANNING C. GloVe:global vectors for word representation[EB/OL].[2018-04-08]. http://nlp.stanford.edu/projects/glove.
[25] SOCHER R, PERELYGIN A, WU J, et al. Sentiment analysis[EB/OL].[2018-04-08]. http://nlp.stanford.edu/sentiment/.
[26] LI X, ROTH D. Experimental data for question classification[EB/OL].[2018-04-08]. http://cogcomp.cs.illinois.edu/Data/QA/QC/.
[27] BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5:135-146.
[28] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg:ACL, 2017:427-431.
[29] MIKOLOV T, GRAVE E, BOJANOWSKI P, et al. English word vectors[EB/OL].[2018-04-08]. https://fasttext.cc/docs/en/english-vectors.html.
[30] MIKOLOV T, GRAVE E, BOJANOWSKI P, et al. Advances in pre-training distributed word representations[EB/OL].[2018-03-02]. https://arxiv.org/pdf/1712.09405.pdf.

[1] WANG Kai, YUE Bo-xuan, FU Jun-wei, LIANG Jun. Image restoration and fault tolerance of stereo SLAM based on generative adversarial net[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2019, 53(1): 115-125.
[2] CHEN Xing-yu, HUANG Shan-he, He Hao-zhe. Measurement error due to frequency selection in multi-frequency suspended sediment measurement system[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2018, 52(2): 307-316.