An answer selection model based on the long short-term memory (LSTM) and decay self-attention (DALSTM) was proposed on the basis of LSTM network, aiming at the problem of insufficient extraction of sentence features and related semantic information between sentences in the answer selection process. Contextual semantic information was extracted more fully by DALSTM which used LSTM and decay self-attention coding layer, and the problem of weight over-focused on keywords caused by repeated use of the attention mechanism was alleviated by the delay matrix. The attention mechanism was used to conduct bidirectional interaction between the information of question and answer, integrate the similarity features between question and answer pairs, and enrich the relevant semantic information between question and answer pairs. DALSTM was evaluated on WiKiQA, TrecQA, and InsuranceQA data sets. Evaluation results showed that compared with other advanced BiLSTM-based models, the DALSTM model had a better overall performance, mean reciprocal rank (MRR) of three data sets reached 0.757, 0.871 and 0.743, respectively.
Qiao-hong CHEN,Fei-yu LI,Qi SUN,Yu-bo JIA. Answer selection model based on LSTM and decay self-attention. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2436-2444.
Fig.1Answer selection model based on LSTM and decay self-attention model frame structure diagram
Fig.2Structure of LSTM unit
Fig.3Structure of multihead attention
Fig.4Decay matrix
数据集
N
NA
LA
训练
验证
测试
WiKiQA
873
126
243
9.8
25.2
TrecQA
1162
68
65
38.4
30.3
InsuranceQA
16686
1 854
1000
—
112
Tab.1Statistics of datasets
模型
WiKiQA
TrecQA
InsuranceQA
MAP
MRR
MAP
MRR
P@1
MRR
CNN
0.620 4
0.636 5
0.661
0.742
0.348
0.486 1
BiLSTM
0.617 4
0.631 0
0.636
0.715
0.533
0.659 7
CNN+BiLSTM
0.656 0
0.673 7
0.678
0.752
0.620
0.668 0
Attention+BiLSTM
0.638 1
0.653 7
0.711
0.801
0.657
0.675 0
ABCNN
0.691 0
0.712 7
—
—
0.643
0.672 0
BERT
0.753 0
0.770 0
0.877
0.927
0.723
0.749 0
DALSTM
0.746 0
0.757 0
0.826
0.871
0.708
0.743 0
Tab.2Comparison result of different models on three datasets
模块
P@1
MRR
?attention
0.692
0.734
?decay
0.689
0.731
?self-attention
0.675
0.723
?BiLSTM
0.657
0.685
DALSTM
0.708
0.743
Tab.3Results of model ablation experiments on InsuranceQA dataset
Fig.5Comparative experiment performance on InsuranceQA dataset
Fig.6Attention weight of different self-attention layers on InsuranceQA dataset
[1]
ZHANG Y T, LU W P, OU W H, et al Chinese medical question answer selection via hybrid models based on CNN and GRU[J]. Multimedia Tools and Applications, 2020, 79: 14751- 14776
doi: 10.1007/s11042-019-7240-1
[2]
LIU D L, NIU Z D, ZHANG C X, et al Multi-scale deformable CNN for answer selection[J]. IEEE Access, 2019, 7: 164986- 164995
doi: 10.1109/ACCESS.2019.2953219
[3]
李超凡, 陈羽中 一种用于答案选择的知识增强混合神经网络[J]. 小型微型计算机系统, 2021, 42 (10): 2065- 2073 LI Chao-Fan, CHEN Yu-Zhong Knowledge-enhanced hybrid neural network for answer selection[J]. Journal of Chinese Computer Systems, 2021, 42 (10): 2065- 2073
doi: 10.3969/j.issn.1000-1220.2021.10.009
[4]
WAKCHAURE M, KULKARNI P. A scheme of answer selection in community question answering using machine learning techniques [C]// 2019 International Conference on Intelligent Computing and Control Systems. Madurai: IEEE, 2019: 879-883.
[5]
MA W, LOU J, JI C, et al ACLSTM: a novel method for CQA answer quality prediction based on question-answer joint learning[J]. Computers, Materials and Continua, 2021, 66 (1): 179- 193
[6]
石磊, 王毅, 成颖, 等 自然语言处理中的注意力机制研究综述[J]. 数据分析与知识发现, 2020, 41 (5): 1- 14 SHI Lei, WANG Yi, CHENG Ying, et al Review of attention mechanism in natural language processing[J]. Data Analysis and Knowledge Discovery, 2020, 41 (5): 1- 14
[7]
YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2021-01-29]. https://arxiv.org/pdf/1804.09541.pdf.
[8]
CHEN X C, YANG Z Y, LIANG N Y, et al Co-attention fusion based deep neural network for Chinese medical answer selection[J]. Applied Intelligence, 2021, 51: 6633- 6646
doi: 10.1007/s10489-021-02212-w
[9]
TAY Y , TUAN L A , HUI S C . Multi-cast attention networks for retrieval-based question answering and response prediction [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1806.00778.pdf.
[10]
BAO G C, WEI Y, SUN X, et al Double attention recurrent convolution neural network for answer selection[J]. Royal Society Open Science, 2020, 7: 191517
doi: 10.1098/rsos.191517
[11]
江龙泉. 基于Attentive LSTM网络模型的答案匹配技术的研究[D]. 上海: 上海师范大学, 2018. JIANG Long-quan. Research on answer matching technology based on Attentive LSTM network model [D]. Shanghai: Shanghai Normal University, 2018.
[12]
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1706.03762.pdf.
[13]
YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1804.09541.pdf.
[14]
SHAO T H, GUO Y P, CHEN H H, et al Transformer-based neural network for answer selection in question answering[J]. IEEE Access, 2019, 7: 26146- 26156
doi: 10.1109/ACCESS.2019.2900753
[15]
PETER M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. [S.l.]: ACL, 2018: 2227–2237.
[16]
RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [R/OL]. [2022-01-07]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[17]
DEBLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. [S. l.]: ACL, 2019: 4171–4186.
[18]
BROMLEY J, BENTZ J W, BOTTOU L, et al Signature verification using a “Siamese” time delay neural network[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7 (4): 669- 688
doi: 10.1142/S0218001493000339
俞海亮, 彭冬亮, 谷雨 结合双层多头自注意力和BiLSTM-CRF的军事武器实体识别[J]. 无线电工程, 2022, 52 (5): 775- 782 YU Hai-liang, PENG Dong-liang, GU Yu Military weapon entity recognition combined with double-layer multi-head self-attention and BiLSTM-CRF[J]. Radio Engineering, 2022, 52 (5): 775- 782
doi: 10.3969/j.issn.1003-3106.2022.05.011
[21]
BIAN W J, LI S, YANG Z, et al. A compare-aggregate model with dynamic-clip attention for answer selection [C]// Proceedings of the 2017 ACM onConference on Information and Knowledge Management. [S.l.]: ACM, 2017: 1987-1990.
[22]
YIN W P, SCHÜTZE H, XIANG B, et al ABCNN: attention-based convolutional neural network for modeling sentence pairs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 259- 272
doi: 10.1162/tacl_a_00097
[23]
LECUN Y, CHOPRA S, HADSELLl R, et al. A tutorial on energy-based learning [EB/OL]. [2022-01-07]. https://typeset.io/pdf/a-tutorial-on-energy-based-learning-2fj3lvviwy.pdf.
[24]
YANG Y, YIH W T, MEEK C. WikiQA: a challenge dataset for open-domain question answering [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2015: 2013-2018
[25]
WANG M Q, SMITH N A, MITAMURA T. What is the jeopardy model? A quasi-synchronous grammar for QA [C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processingand Computational Natural Language Learning. [S. l.]: ACL, 2007: 22-32.
[26]
FENG M W, XIANG B, GLASS M R, et al. Applying deep learning to answer selection: a study and an open task [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1508.01585.pdf.
[27]
PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation [C]// Proceeding of the 2014 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2014: 1532-1543.