Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (12): 2436-2444    DOI: 10.3785/j.issn.1008-973X.2022.12.012
计算机技术     
基于LSTM与衰减自注意力的答案选择模型
陈巧红(),李妃玉,孙麒,贾宇波
浙江理工大学 计算机科学与技术学院,浙江 杭州 310018
Answer selection model based on LSTM and decay self-attention
Qiao-hong CHEN(),Fei-yu LI,Qi SUN,Yu-bo JIA
School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
 全文: PDF(1074 KB)   HTML
摘要:

针对答案选择过程中存在语句特征、语句间的相关语义信息提取不充分的问题,在长短时记忆网络(LSTM)的基础上,提出基于LSTM和衰减自注意力的答案选择模型(DALSTM). DALSTM使用LSTM和衰减自注意力编码层提取丰富的上下文语义信息,通过衰减矩阵缓解反复使用注意力机制出现的权重过集中于关键词的问题. 使用注意力机制对问题与答案间的信息进行双向交互,融合问答对间的相似性特征,丰富问答对间的相关语义信息. 在WiKiQA、TrecQA及InsuranceQA数据集上的模型评估结果表明,相较于其他基于BiLSTM的先进模型,DALSTM的整体性能表现更好,3个数据集的平均倒数排名(MRR)分别达到0.757、0.871、0.743.

关键词: 问答 (QA)答案选择长短时记忆(LSTM)衰减自注意力注意力机制    
Abstract:

An answer selection model based on the long short-term memory (LSTM) and decay self-attention (DALSTM) was proposed on the basis of LSTM network, aiming at the problem of insufficient extraction of sentence features and related semantic information between sentences in the answer selection process. Contextual semantic information was extracted more fully by DALSTM which used LSTM and decay self-attention coding layer, and the problem of weight over-focused on keywords caused by repeated use of the attention mechanism was alleviated by the delay matrix. The attention mechanism was used to conduct bidirectional interaction between the information of question and answer, integrate the similarity features between question and answer pairs, and enrich the relevant semantic information between question and answer pairs. DALSTM was evaluated on WiKiQA, TrecQA, and InsuranceQA data sets. Evaluation results showed that compared with other advanced BiLSTM-based models, the DALSTM model had a better overall performance, mean reciprocal rank (MRR) of three data sets reached 0.757, 0.871 and 0.743, respectively.

Key words: question answering (QA)    answer select    long short-term memory (LSTM)    decay self-attention    attention mechanism
收稿日期: 2022-01-07 出版日期: 2023-01-03
CLC:  TP 391  
基金资助: 浙江理工大学中青年骨干人才培养经费项目
作者简介: 陈巧红(1978—),女,副教授,从事并联机器人智能优化设计及机器学习技术研究. orcid.org/0000-0003-0595-341X.E-mail: chen_lisa@zstu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
陈巧红
李妃玉
孙麒
贾宇波

引用本文:

陈巧红,李妃玉,孙麒,贾宇波. 基于LSTM与衰减自注意力的答案选择模型[J]. 浙江大学学报(工学版), 2022, 56(12): 2436-2444.

Qiao-hong CHEN,Fei-yu LI,Qi SUN,Yu-bo JIA. Answer selection model based on LSTM and decay self-attention. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2436-2444.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.12.012        https://www.zjujournals.com/eng/CN/Y2022/V56/I12/2436

图 1  基于LSTM与衰减自注意力的答案选择模型框架结构图
图 2  LSTM单元结构
图 3  多头注意力结构
图 4  衰减矩阵
数据集 N NA LA
训练 验证 测试
WiKiQA 873 126 243 9.8 25.2
TrecQA 1162 68 65 38.4 30.3
InsuranceQA 16686 1 854 1000 112
表 1  数据集统计信息
模型 WiKiQA TrecQA InsuranceQA
MAP MRR MAP MRR P@1 MRR
CNN 0.620 4 0.636 5 0.661 0.742 0.348 0.486 1
BiLSTM 0.617 4 0.631 0 0.636 0.715 0.533 0.659 7
CNN+BiLSTM 0.656 0 0.673 7 0.678 0.752 0.620 0.668 0
Attention+BiLSTM 0.638 1 0.653 7 0.711 0.801 0.657 0.675 0
ABCNN 0.691 0 0.712 7 0.643 0.672 0
BERT 0.753 0 0.770 0 0.877 0.927 0.723 0.749 0
DALSTM 0.746 0 0.757 0 0.826 0.871 0.708 0.743 0
表 2  不同模型在3个数据集上的实验对比结果
模块 P@1 MRR
?attention 0.692 0.734
?decay 0.689 0.731
?self-attention 0.675 0.723
?BiLSTM 0.657 0.685
DALSTM 0.708 0.743
表 3  InsuranceQA数据集上的模型消融实验结果
图 5  InsuranceQA数据集上的实验性能对比
图 6  InsuranceQA数据集上不同自注意力层的注意力权重
1 ZHANG Y T, LU W P, OU W H, et al Chinese medical question answer selection via hybrid models based on CNN and GRU[J]. Multimedia Tools and Applications, 2020, 79: 14751- 14776
doi: 10.1007/s11042-019-7240-1
2 LIU D L, NIU Z D, ZHANG C X, et al Multi-scale deformable CNN for answer selection[J]. IEEE Access, 2019, 7: 164986- 164995
doi: 10.1109/ACCESS.2019.2953219
3 李超凡, 陈羽中 一种用于答案选择的知识增强混合神经网络[J]. 小型微型计算机系统, 2021, 42 (10): 2065- 2073
LI Chao-Fan, CHEN Yu-Zhong Knowledge-enhanced hybrid neural network for answer selection[J]. Journal of Chinese Computer Systems, 2021, 42 (10): 2065- 2073
doi: 10.3969/j.issn.1000-1220.2021.10.009
4 WAKCHAURE M, KULKARNI P. A scheme of answer selection in community question answering using machine learning techniques [C]// 2019 International Conference on Intelligent Computing and Control Systems. Madurai: IEEE, 2019: 879-883.
5 MA W, LOU J, JI C, et al ACLSTM: a novel method for CQA answer quality prediction based on question-answer joint learning[J]. Computers, Materials and Continua, 2021, 66 (1): 179- 193
6 石磊, 王毅, 成颖, 等 自然语言处理中的注意力机制研究综述[J]. 数据分析与知识发现, 2020, 41 (5): 1- 14
SHI Lei, WANG Yi, CHENG Ying, et al Review of attention mechanism in natural language processing[J]. Data Analysis and Knowledge Discovery, 2020, 41 (5): 1- 14
7 YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2021-01-29]. https://arxiv.org/pdf/1804.09541.pdf.
8 CHEN X C, YANG Z Y, LIANG N Y, et al Co-attention fusion based deep neural network for Chinese medical answer selection[J]. Applied Intelligence, 2021, 51: 6633- 6646
doi: 10.1007/s10489-021-02212-w
9 TAY Y , TUAN L A , HUI S C . Multi-cast attention networks for retrieval-based question answering and response prediction [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1806.00778.pdf.
10 BAO G C, WEI Y, SUN X, et al Double attention recurrent convolution neural network for answer selection[J]. Royal Society Open Science, 2020, 7: 191517
doi: 10.1098/rsos.191517
11 江龙泉. 基于Attentive LSTM网络模型的答案匹配技术的研究[D]. 上海: 上海师范大学, 2018.
JIANG Long-quan. Research on answer matching technology based on Attentive LSTM network model [D]. Shanghai: Shanghai Normal University, 2018.
12 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1706.03762.pdf.
13 YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1804.09541.pdf.
14 SHAO T H, GUO Y P, CHEN H H, et al Transformer-based neural network for answer selection in question answering[J]. IEEE Access, 2019, 7: 26146- 26156
doi: 10.1109/ACCESS.2019.2900753
15 PETER M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. [S.l.]: ACL, 2018: 2227–2237.
16 RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [R/OL]. [2022-01-07]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
17 DEBLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. [S. l.]: ACL, 2019: 4171–4186.
18 BROMLEY J, BENTZ J W, BOTTOU L, et al Signature verification using a “Siamese” time delay neural network[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7 (4): 669- 688
doi: 10.1142/S0218001493000339
19 HOCHREITER S, SCHMIDHUB J Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780
doi: 10.1162/neco.1997.9.8.1735
20 俞海亮, 彭冬亮, 谷雨 结合双层多头自注意力和BiLSTM-CRF的军事武器实体识别[J]. 无线电工程, 2022, 52 (5): 775- 782
YU Hai-liang, PENG Dong-liang, GU Yu Military weapon entity recognition combined with double-layer multi-head self-attention and BiLSTM-CRF[J]. Radio Engineering, 2022, 52 (5): 775- 782
doi: 10.3969/j.issn.1003-3106.2022.05.011
21 BIAN W J, LI S, YANG Z, et al. A compare-aggregate model with dynamic-clip attention for answer selection [C]// Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. [S.l.]: ACM, 2017: 1987-1990.
22 YIN W P, SCHÜTZE H, XIANG B, et al ABCNN: attention-based convolutional neural network for modeling sentence pairs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 259- 272
doi: 10.1162/tacl_a_00097
23 LECUN Y, CHOPRA S, HADSELLl R, et al. A tutorial on energy-based learning [EB/OL]. [2022-01-07]. https://typeset.io/pdf/a-tutorial-on-energy-based-learning-2fj3lvviwy.pdf.
24 YANG Y, YIH W T, MEEK C. WikiQA: a challenge dataset for open-domain question answering [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2015: 2013-2018
25 WANG M Q, SMITH N A, MITAMURA T. What is the jeopardy model? A quasi-synchronous grammar for QA [C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. [S. l.]: ACL, 2007: 22-32.
26 FENG M W, XIANG B, GLASS M R, et al. Applying deep learning to answer selection: a study and an open task [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1508.01585.pdf.
27 PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation [C]// Proceeding of the 2014 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2014: 1532-1543.
[1] 郝琨,王阔,王贝贝. 基于改进Mobilenet-YOLOv3的轻量级水下生物检测算法[J]. 浙江大学学报(工学版), 2022, 56(8): 1622-1632.
[2] 莫仁鹏,司小胜,李天梅,朱旭. 基于多尺度特征与注意力机制的轴承寿命预测[J]. 浙江大学学报(工学版), 2022, 56(7): 1447-1456.
[3] 王友卫,童爽,凤丽洲,朱建明,李洋,陈福. 基于图卷积网络的归纳式微博谣言检测新方法[J]. 浙江大学学报(工学版), 2022, 56(5): 956-966.
[4] 鞠晓臣,赵欣欣,钱胜胜. 基于自注意力机制的桥梁螺栓检测算法[J]. 浙江大学学报(工学版), 2022, 56(5): 901-908.
[5] 张雪芹,李天任. 基于Cycle-GAN和改进DPN网络的乳腺癌病理图像分类[J]. 浙江大学学报(工学版), 2022, 56(4): 727-735.
[6] 许萌,王丹,李致远,陈远方. IncepA-EEGNet: 融合Inception网络和注意力机制的P300信号检测方法[J]. 浙江大学学报(工学版), 2022, 56(4): 745-753, 782.
[7] 柳长源,何先平,毕晓君. 融合注意力机制的高效率网络车型识别[J]. 浙江大学学报(工学版), 2022, 56(4): 775-782.
[8] 陈巧红,裴皓磊,孙麒. 基于视觉关系推理与上下文门控机制的图像描述[J]. 浙江大学学报(工学版), 2022, 56(3): 542-549.
[9] 农元君,王俊杰,陈红,孙文涵,耿慧,李书悦. 基于注意力机制和编码-解码架构的施工场景图像描述方法[J]. 浙江大学学报(工学版), 2022, 56(2): 236-244.
[10] 刘英莉,吴瑞刚,么长慧,沈韬. 铝硅合金实体关系抽取数据集的构建方法[J]. 浙江大学学报(工学版), 2022, 56(2): 245-253.
[11] 袁天乐,袁巨龙,朱勇建,郑翰辰. 基于改进YOLOv5的推力球轴承表面缺陷检测算法[J]. 浙江大学学报(工学版), 2022, 56(12): 2349-2357.
[12] 于楠晶,范晓飚,邓天民,冒国韬. 基于多头自注意力的复杂背景船舶检测算法[J]. 浙江大学学报(工学版), 2022, 56(12): 2392-2402.
[13] 李飞,胡坤,张勇,王文善,蒋浩. 基于混合域注意力YOLOv4的输送带纵向撕裂多维度检测[J]. 浙江大学学报(工学版), 2022, 56(11): 2156-2167.
[14] 李凯,林宇舜,吴晓琳,廖飞宇. 基于多尺度融合与注意力机制的小目标车辆检测[J]. 浙江大学学报(工学版), 2022, 56(11): 2241-2250.
[15] 彭向东,潘从成,柯泽浚,朱华强,周肖. 基于并行架构和时空注意力机制的心电分类方法[J]. 浙江大学学报(工学版), 2022, 56(10): 1912-1923.