|
|
Answer selection model based on LSTM and decay self-attention |
Qiao-hong CHEN(),Fei-yu LI,Qi SUN,Yu-bo JIA |
School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China |
|
|
Abstract An answer selection model based on the long short-term memory (LSTM) and decay self-attention (DALSTM) was proposed on the basis of LSTM network, aiming at the problem of insufficient extraction of sentence features and related semantic information between sentences in the answer selection process. Contextual semantic information was extracted more fully by DALSTM which used LSTM and decay self-attention coding layer, and the problem of weight over-focused on keywords caused by repeated use of the attention mechanism was alleviated by the delay matrix. The attention mechanism was used to conduct bidirectional interaction between the information of question and answer, integrate the similarity features between question and answer pairs, and enrich the relevant semantic information between question and answer pairs. DALSTM was evaluated on WiKiQA, TrecQA, and InsuranceQA data sets. Evaluation results showed that compared with other advanced BiLSTM-based models, the DALSTM model had a better overall performance, mean reciprocal rank (MRR) of three data sets reached 0.757, 0.871 and 0.743, respectively.
|
Received: 07 January 2022
Published: 03 January 2023
|
|
Fund: 浙江理工大学中青年骨干人才培养经费项目 |
基于LSTM与衰减自注意力的答案选择模型
针对答案选择过程中存在语句特征、语句间的相关语义信息提取不充分的问题,在长短时记忆网络(LSTM)的基础上,提出基于LSTM和衰减自注意力的答案选择模型(DALSTM). DALSTM使用LSTM和衰减自注意力编码层提取丰富的上下文语义信息,通过衰减矩阵缓解反复使用注意力机制出现的权重过集中于关键词的问题. 使用注意力机制对问题与答案间的信息进行双向交互,融合问答对间的相似性特征,丰富问答对间的相关语义信息. 在WiKiQA、TrecQA及InsuranceQA数据集上的模型评估结果表明,相较于其他基于BiLSTM的先进模型,DALSTM的整体性能表现更好,3个数据集的平均倒数排名(MRR)分别达到0.757、0.871、0.743.
关键词:
问答 (QA),
答案选择,
长短时记忆(LSTM),
衰减自注意力,
注意力机制
|
|
[1] |
ZHANG Y T, LU W P, OU W H, et al Chinese medical question answer selection via hybrid models based on CNN and GRU[J]. Multimedia Tools and Applications, 2020, 79: 14751- 14776
doi: 10.1007/s11042-019-7240-1
|
|
|
[2] |
LIU D L, NIU Z D, ZHANG C X, et al Multi-scale deformable CNN for answer selection[J]. IEEE Access, 2019, 7: 164986- 164995
doi: 10.1109/ACCESS.2019.2953219
|
|
|
[3] |
李超凡, 陈羽中 一种用于答案选择的知识增强混合神经网络[J]. 小型微型计算机系统, 2021, 42 (10): 2065- 2073 LI Chao-Fan, CHEN Yu-Zhong Knowledge-enhanced hybrid neural network for answer selection[J]. Journal of Chinese Computer Systems, 2021, 42 (10): 2065- 2073
doi: 10.3969/j.issn.1000-1220.2021.10.009
|
|
|
[4] |
WAKCHAURE M, KULKARNI P. A scheme of answer selection in community question answering using machine learning techniques [C]// 2019 International Conference on Intelligent Computing and Control Systems. Madurai: IEEE, 2019: 879-883.
|
|
|
[5] |
MA W, LOU J, JI C, et al ACLSTM: a novel method for CQA answer quality prediction based on question-answer joint learning[J]. Computers, Materials and Continua, 2021, 66 (1): 179- 193
|
|
|
[6] |
石磊, 王毅, 成颖, 等 自然语言处理中的注意力机制研究综述[J]. 数据分析与知识发现, 2020, 41 (5): 1- 14 SHI Lei, WANG Yi, CHENG Ying, et al Review of attention mechanism in natural language processing[J]. Data Analysis and Knowledge Discovery, 2020, 41 (5): 1- 14
|
|
|
[7] |
YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2021-01-29]. https://arxiv.org/pdf/1804.09541.pdf.
|
|
|
[8] |
CHEN X C, YANG Z Y, LIANG N Y, et al Co-attention fusion based deep neural network for Chinese medical answer selection[J]. Applied Intelligence, 2021, 51: 6633- 6646
doi: 10.1007/s10489-021-02212-w
|
|
|
[9] |
TAY Y , TUAN L A , HUI S C . Multi-cast attention networks for retrieval-based question answering and response prediction [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1806.00778.pdf.
|
|
|
[10] |
BAO G C, WEI Y, SUN X, et al Double attention recurrent convolution neural network for answer selection[J]. Royal Society Open Science, 2020, 7: 191517
doi: 10.1098/rsos.191517
|
|
|
[11] |
江龙泉. 基于Attentive LSTM网络模型的答案匹配技术的研究[D]. 上海: 上海师范大学, 2018. JIANG Long-quan. Research on answer matching technology based on Attentive LSTM network model [D]. Shanghai: Shanghai Normal University, 2018.
|
|
|
[12] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1706.03762.pdf.
|
|
|
[13] |
YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1804.09541.pdf.
|
|
|
[14] |
SHAO T H, GUO Y P, CHEN H H, et al Transformer-based neural network for answer selection in question answering[J]. IEEE Access, 2019, 7: 26146- 26156
doi: 10.1109/ACCESS.2019.2900753
|
|
|
[15] |
PETER M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. [S.l.]: ACL, 2018: 2227–2237.
|
|
|
[16] |
RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [R/OL]. [2022-01-07]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
|
|
|
[17] |
DEBLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. [S. l.]: ACL, 2019: 4171–4186.
|
|
|
[18] |
BROMLEY J, BENTZ J W, BOTTOU L, et al Signature verification using a “Siamese” time delay neural network[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7 (4): 669- 688
doi: 10.1142/S0218001493000339
|
|
|
[19] |
HOCHREITER S, SCHMIDHUB J Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780
doi: 10.1162/neco.1997.9.8.1735
|
|
|
[20] |
俞海亮, 彭冬亮, 谷雨 结合双层多头自注意力和BiLSTM-CRF的军事武器实体识别[J]. 无线电工程, 2022, 52 (5): 775- 782 YU Hai-liang, PENG Dong-liang, GU Yu Military weapon entity recognition combined with double-layer multi-head self-attention and BiLSTM-CRF[J]. Radio Engineering, 2022, 52 (5): 775- 782
doi: 10.3969/j.issn.1003-3106.2022.05.011
|
|
|
[21] |
BIAN W J, LI S, YANG Z, et al. A compare-aggregate model with dynamic-clip attention for answer selection [C]// Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. [S.l.]: ACM, 2017: 1987-1990.
|
|
|
[22] |
YIN W P, SCHÜTZE H, XIANG B, et al ABCNN: attention-based convolutional neural network for modeling sentence pairs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 259- 272
doi: 10.1162/tacl_a_00097
|
|
|
[23] |
LECUN Y, CHOPRA S, HADSELLl R, et al. A tutorial on energy-based learning [EB/OL]. [2022-01-07]. https://typeset.io/pdf/a-tutorial-on-energy-based-learning-2fj3lvviwy.pdf.
|
|
|
[24] |
YANG Y, YIH W T, MEEK C. WikiQA: a challenge dataset for open-domain question answering [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2015: 2013-2018
|
|
|
[25] |
WANG M Q, SMITH N A, MITAMURA T. What is the jeopardy model? A quasi-synchronous grammar for QA [C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. [S. l.]: ACL, 2007: 22-32.
|
|
|
[26] |
FENG M W, XIANG B, GLASS M R, et al. Applying deep learning to answer selection: a study and an open task [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1508.01585.pdf.
|
|
|
[27] |
PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation [C]// Proceeding of the 2014 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2014: 1532-1543.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|