Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (12): 2436-2444    DOI: 10.3785/j.issn.1008-973X.2022.12.012
    
Answer selection model based on LSTM and decay self-attention
Qiao-hong CHEN(),Fei-yu LI,Qi SUN,Yu-bo JIA
School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
Download: HTML     PDF(1074KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An answer selection model based on the long short-term memory (LSTM) and decay self-attention (DALSTM) was proposed on the basis of LSTM network, aiming at the problem of insufficient extraction of sentence features and related semantic information between sentences in the answer selection process. Contextual semantic information was extracted more fully by DALSTM which used LSTM and decay self-attention coding layer, and the problem of weight over-focused on keywords caused by repeated use of the attention mechanism was alleviated by the delay matrix. The attention mechanism was used to conduct bidirectional interaction between the information of question and answer, integrate the similarity features between question and answer pairs, and enrich the relevant semantic information between question and answer pairs. DALSTM was evaluated on WiKiQA, TrecQA, and InsuranceQA data sets. Evaluation results showed that compared with other advanced BiLSTM-based models, the DALSTM model had a better overall performance, mean reciprocal rank (MRR) of three data sets reached 0.757, 0.871 and 0.743, respectively.



Key wordsquestion answering (QA)      answer select      long short-term memory (LSTM)      decay self-attention      attention mechanism     
Received: 07 January 2022      Published: 03 January 2023
CLC:  TP 391  
Fund:  浙江理工大学中青年骨干人才培养经费项目
Cite this article:

Qiao-hong CHEN,Fei-yu LI,Qi SUN,Yu-bo JIA. Answer selection model based on LSTM and decay self-attention. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2436-2444.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.12.012     OR     https://www.zjujournals.com/eng/Y2022/V56/I12/2436


基于LSTM与衰减自注意力的答案选择模型

针对答案选择过程中存在语句特征、语句间的相关语义信息提取不充分的问题,在长短时记忆网络(LSTM)的基础上,提出基于LSTM和衰减自注意力的答案选择模型(DALSTM). DALSTM使用LSTM和衰减自注意力编码层提取丰富的上下文语义信息,通过衰减矩阵缓解反复使用注意力机制出现的权重过集中于关键词的问题. 使用注意力机制对问题与答案间的信息进行双向交互,融合问答对间的相似性特征,丰富问答对间的相关语义信息. 在WiKiQA、TrecQA及InsuranceQA数据集上的模型评估结果表明,相较于其他基于BiLSTM的先进模型,DALSTM的整体性能表现更好,3个数据集的平均倒数排名(MRR)分别达到0.757、0.871、0.743.


关键词: 问答 (QA),  答案选择,  长短时记忆(LSTM),  衰减自注意力,  注意力机制 
Fig.1 Answer selection model based on LSTM and decay self-attention model frame structure diagram
Fig.2 Structure of LSTM unit
Fig.3 Structure of multihead attention
Fig.4 Decay matrix
数据集 N NA LA
训练 验证 测试
WiKiQA 873 126 243 9.8 25.2
TrecQA 1162 68 65 38.4 30.3
InsuranceQA 16686 1 854 1000 112
Tab.1 Statistics of datasets
模型 WiKiQA TrecQA InsuranceQA
MAP MRR MAP MRR P@1 MRR
CNN 0.620 4 0.636 5 0.661 0.742 0.348 0.486 1
BiLSTM 0.617 4 0.631 0 0.636 0.715 0.533 0.659 7
CNN+BiLSTM 0.656 0 0.673 7 0.678 0.752 0.620 0.668 0
Attention+BiLSTM 0.638 1 0.653 7 0.711 0.801 0.657 0.675 0
ABCNN 0.691 0 0.712 7 0.643 0.672 0
BERT 0.753 0 0.770 0 0.877 0.927 0.723 0.749 0
DALSTM 0.746 0 0.757 0 0.826 0.871 0.708 0.743 0
Tab.2 Comparison result of different models on three datasets
模块 P@1 MRR
?attention 0.692 0.734
?decay 0.689 0.731
?self-attention 0.675 0.723
?BiLSTM 0.657 0.685
DALSTM 0.708 0.743
Tab.3 Results of model ablation experiments on InsuranceQA dataset
Fig.5 Comparative experiment performance on InsuranceQA dataset
Fig.6 Attention weight of different self-attention layers on InsuranceQA dataset
[1]   ZHANG Y T, LU W P, OU W H, et al Chinese medical question answer selection via hybrid models based on CNN and GRU[J]. Multimedia Tools and Applications, 2020, 79: 14751- 14776
doi: 10.1007/s11042-019-7240-1
[2]   LIU D L, NIU Z D, ZHANG C X, et al Multi-scale deformable CNN for answer selection[J]. IEEE Access, 2019, 7: 164986- 164995
doi: 10.1109/ACCESS.2019.2953219
[3]   李超凡, 陈羽中 一种用于答案选择的知识增强混合神经网络[J]. 小型微型计算机系统, 2021, 42 (10): 2065- 2073
LI Chao-Fan, CHEN Yu-Zhong Knowledge-enhanced hybrid neural network for answer selection[J]. Journal of Chinese Computer Systems, 2021, 42 (10): 2065- 2073
doi: 10.3969/j.issn.1000-1220.2021.10.009
[4]   WAKCHAURE M, KULKARNI P. A scheme of answer selection in community question answering using machine learning techniques [C]// 2019 International Conference on Intelligent Computing and Control Systems. Madurai: IEEE, 2019: 879-883.
[5]   MA W, LOU J, JI C, et al ACLSTM: a novel method for CQA answer quality prediction based on question-answer joint learning[J]. Computers, Materials and Continua, 2021, 66 (1): 179- 193
[6]   石磊, 王毅, 成颖, 等 自然语言处理中的注意力机制研究综述[J]. 数据分析与知识发现, 2020, 41 (5): 1- 14
SHI Lei, WANG Yi, CHENG Ying, et al Review of attention mechanism in natural language processing[J]. Data Analysis and Knowledge Discovery, 2020, 41 (5): 1- 14
[7]   YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2021-01-29]. https://arxiv.org/pdf/1804.09541.pdf.
[8]   CHEN X C, YANG Z Y, LIANG N Y, et al Co-attention fusion based deep neural network for Chinese medical answer selection[J]. Applied Intelligence, 2021, 51: 6633- 6646
doi: 10.1007/s10489-021-02212-w
[9]   TAY Y , TUAN L A , HUI S C . Multi-cast attention networks for retrieval-based question answering and response prediction [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1806.00778.pdf.
[10]   BAO G C, WEI Y, SUN X, et al Double attention recurrent convolution neural network for answer selection[J]. Royal Society Open Science, 2020, 7: 191517
doi: 10.1098/rsos.191517
[11]   江龙泉. 基于Attentive LSTM网络模型的答案匹配技术的研究[D]. 上海: 上海师范大学, 2018.
JIANG Long-quan. Research on answer matching technology based on Attentive LSTM network model [D]. Shanghai: Shanghai Normal University, 2018.
[12]   VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1706.03762.pdf.
[13]   YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1804.09541.pdf.
[14]   SHAO T H, GUO Y P, CHEN H H, et al Transformer-based neural network for answer selection in question answering[J]. IEEE Access, 2019, 7: 26146- 26156
doi: 10.1109/ACCESS.2019.2900753
[15]   PETER M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. [S.l.]: ACL, 2018: 2227–2237.
[16]   RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [R/OL]. [2022-01-07]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[17]   DEBLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. [S. l.]: ACL, 2019: 4171–4186.
[18]   BROMLEY J, BENTZ J W, BOTTOU L, et al Signature verification using a “Siamese” time delay neural network[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7 (4): 669- 688
doi: 10.1142/S0218001493000339
[19]   HOCHREITER S, SCHMIDHUB J Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780
doi: 10.1162/neco.1997.9.8.1735
[20]   俞海亮, 彭冬亮, 谷雨 结合双层多头自注意力和BiLSTM-CRF的军事武器实体识别[J]. 无线电工程, 2022, 52 (5): 775- 782
YU Hai-liang, PENG Dong-liang, GU Yu Military weapon entity recognition combined with double-layer multi-head self-attention and BiLSTM-CRF[J]. Radio Engineering, 2022, 52 (5): 775- 782
doi: 10.3969/j.issn.1003-3106.2022.05.011
[21]   BIAN W J, LI S, YANG Z, et al. A compare-aggregate model with dynamic-clip attention for answer selection [C]// Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. [S.l.]: ACM, 2017: 1987-1990.
[22]   YIN W P, SCHÜTZE H, XIANG B, et al ABCNN: attention-based convolutional neural network for modeling sentence pairs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 259- 272
doi: 10.1162/tacl_a_00097
[23]   LECUN Y, CHOPRA S, HADSELLl R, et al. A tutorial on energy-based learning [EB/OL]. [2022-01-07]. https://typeset.io/pdf/a-tutorial-on-energy-based-learning-2fj3lvviwy.pdf.
[24]   YANG Y, YIH W T, MEEK C. WikiQA: a challenge dataset for open-domain question answering [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2015: 2013-2018
[25]   WANG M Q, SMITH N A, MITAMURA T. What is the jeopardy model? A quasi-synchronous grammar for QA [C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. [S. l.]: ACL, 2007: 22-32.
[26]   FENG M W, XIANG B, GLASS M R, et al. Applying deep learning to answer selection: a study and an open task [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1508.01585.pdf.
[27]   PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation [C]// Proceeding of the 2014 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2014: 1532-1543.
[1] Kun HAO,Kuo WANG,Bei-bei WANG. Lightweight underwater biological detection algorithm based on improved Mobilenet-YOLOv3[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1622-1632.
[2] Ren-peng MO,Xiao-sheng SI,Tian-mei LI,Xu ZHU. Bearing life prediction based on multi-scale features and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1447-1456.
[3] You-wei WANG,Shuang TONG,Li-zhou FENG,Jian-ming ZHU,Yang LI,Fu CHEN. New inductive microblog rumor detection method based on graph convolutional network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 956-966.
[4] Xiao-chen JU,Xin-xin ZHAO,Sheng-sheng QIAN. Self-attention mechanism based bridge bolt detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 901-908.
[5] Xue-qin ZHANG,Tian-ren LI. Breast cancer pathological image classification based on Cycle-GAN and improved DPN network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 727-735.
[6] Meng XU,Dan WANG,Zhi-yuan LI,Yuan-fang CHEN. IncepA-EEGNet: P300 signal detection method based on fusion of Inception network and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 745-753, 782.
[7] Chang-yuan LIU,Xian-ping HE,Xiao-jun BI. Efficient network vehicle recognition combined with attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 775-782.
[8] Qiao-hong CHEN,Hao-lei PEI,Qi SUN. Image caption based on relational reasoning and context gate mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 542-549.
[9] Yuan-jun NONG,Jun-jie WANG,Hong CHEN,Wen-han SUN,Hui GENG,Shu-yue LI. A image caption method of construction scene based on attention mechanism and encoding-decoding architecture[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 236-244.
[10] Ying-li LIU,Rui-gang WU,Chang-hui YAO,Tao SHEN. Construction method of extraction dataset of Al-Si alloy entity relationship[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 245-253.
[11] Tian-le YUAN,Ju-long YUAN,Yong-jian ZHU,Han-chen ZHENG. Surface defect detection algorithm of thrust ball bearing based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2349-2357.
[12] Nan-jing YU,Xiao-biao FAN,Tian-min DENG,Guo-tao MAO. Ship detection algorithm in complex backgrounds via multi-head self-attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2392-2402.
[13] Fei LI,Kun HU,Yong ZHANG,Wen-shan WANG,Hao JIANG. Multi-dimensional detection of longitudinal tearing of conveyor belt based on YOLOv4 of hybrid domain attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2156-2167.
[14] Kai LI,Yu-shun LIN,Xiao-lin WU,Fei-yu LIAO. Small target vehicle detection based on multi-scale fusion technology and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2241-2250.
[15] Xiang-dong PENG,Cong-cheng PAN,Ze-jun KE,Hua-qiang ZHU,Xiao ZHOU. Classification method for electrocardiograph signals based on parallel architecture model and spatial-temporal attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1912-1923.