Answer selection model based on LSTM and decay self-attention

doi:10.3785/j.issn.1008-973X.2022.12.012

Journal of ZheJiang University (Engineering Science)

2022, Vol. 56

Issue (12): 2436-2444 DOI: 10.3785/j.issn.1008-973X.2022.12.012

Answer selection model based on LSTM and decay self-attention

Qiao-hong CHEN(

),Fei-yu LI,Qi SUN,Yu-bo JIA

School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China

Download:

HTML

PDF(1074KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

An answer selection model based on the long short-term memory (LSTM) and decay self-attention (DALSTM) was proposed on the basis of LSTM network, aiming at the problem of insufficient extraction of sentence features and related semantic information between sentences in the answer selection process. Contextual semantic information was extracted more fully by DALSTM which used LSTM and decay self-attention coding layer, and the problem of weight over-focused on keywords caused by repeated use of the attention mechanism was alleviated by the delay matrix. The attention mechanism was used to conduct bidirectional interaction between the information of question and answer, integrate the similarity features between question and answer pairs, and enrich the relevant semantic information between question and answer pairs. DALSTM was evaluated on WiKiQA, TrecQA, and InsuranceQA data sets. Evaluation results showed that compared with other advanced BiLSTM-based models, the DALSTM model had a better overall performance, mean reciprocal rank (MRR) of three data sets reached 0.757, 0.871 and 0.743, respectively.

Key words： question answering (QA) answer select long short-term memory (LSTM) decay self-attention attention mechanism

Received: 07 January 2022 Published: 03 January 2023

CLC:

TP 391

Fund: 浙江理工大学中青年骨干人才培养经费项目

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Qiao-hong CHEN
	Fei-yu LI
	Qi SUN
	Yu-bo JIA

Cite this article:

Qiao-hong CHEN,Fei-yu LI,Qi SUN,Yu-bo JIA. Answer selection model based on LSTM and decay self-attention. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2436-2444.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.12.012 OR https://www.zjujournals.com/eng/Y2022/V56/I12/2436

基于LSTM与衰减自注意力的答案选择模型

针对答案选择过程中存在语句特征、语句间的相关语义信息提取不充分的问题，在长短时记忆网络(LSTM)的基础上，提出基于LSTM和衰减自注意力的答案选择模型(DALSTM). DALSTM使用LSTM和衰减自注意力编码层提取丰富的上下文语义信息，通过衰减矩阵缓解反复使用注意力机制出现的权重过集中于关键词的问题. 使用注意力机制对问题与答案间的信息进行双向交互，融合问答对间的相似性特征，丰富问答对间的相关语义信息. 在WiKiQA、TrecQA及InsuranceQA数据集上的模型评估结果表明，相较于其他基于BiLSTM的先进模型，DALSTM的整体性能表现更好，3个数据集的平均倒数排名（MRR）分别达到0.757、0.871、0.743.

关键词： 问答 (QA), 答案选择, 长短时记忆(LSTM), 衰减自注意力, 注意力机制

Fig.1 Answer selection model based on LSTM and decay self-attention model frame structure diagram

Fig.2 Structure of LSTM unit

Fig.3 Structure of multihead attention

Fig.4 Decay matrix

Tab.1 Statistics of datasets

Tab.2 Comparison result of different models on three datasets

Tab.3 Results of model ablation experiments on InsuranceQA dataset

Fig.5 Comparative experiment performance on InsuranceQA dataset

Fig.6 Attention weight of different self-attention layers on InsuranceQA dataset


[1]	ZHANG Y T, LU W P, OU W H, et al Chinese medical question answer selection via hybrid models based on CNN and GRU[J]. Multimedia Tools and Applications, 2020, 79: 14751- 14776 doi: 10.1007/s11042-019-7240-1

[2]	LIU D L, NIU Z D, ZHANG C X, et al Multi-scale deformable CNN for answer selection[J]. IEEE Access, 2019, 7: 164986- 164995 doi: 10.1109/ACCESS.2019.2953219

[3]	李超凡, 陈羽中一种用于答案选择的知识增强混合神经网络[J]. 小型微型计算机系统, 2021, 42 (10): 2065- 2073 LI Chao-Fan, CHEN Yu-Zhong Knowledge-enhanced hybrid neural network for answer selection[J]. Journal of Chinese Computer Systems, 2021, 42 (10): 2065- 2073 doi: 10.3969/j.issn.1000-1220.2021.10.009

[4]	WAKCHAURE M, KULKARNI P. A scheme of answer selection in community question answering using machine learning techniques [C]// 2019 International Conference on Intelligent Computing and Control Systems. Madurai: IEEE, 2019: 879-883.

[5]	MA W, LOU J, JI C, et al ACLSTM: a novel method for CQA answer quality prediction based on question-answer joint learning[J]. Computers, Materials and Continua, 2021, 66 (1): 179- 193

[6]	石磊, 王毅, 成颖, 等自然语言处理中的注意力机制研究综述[J]. 数据分析与知识发现, 2020, 41 (5): 1- 14 SHI Lei, WANG Yi, CHENG Ying, et al Review of attention mechanism in natural language processing[J]. Data Analysis and Knowledge Discovery, 2020, 41 (5): 1- 14

[7]	YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2021-01-29]. https://arxiv.org/pdf/1804.09541.pdf.

[8]	CHEN X C, YANG Z Y, LIANG N Y, et al Co-attention fusion based deep neural network for Chinese medical answer selection[J]. Applied Intelligence, 2021, 51: 6633- 6646 doi: 10.1007/s10489-021-02212-w

[9]	TAY Y , TUAN L A , HUI S C . Multi-cast attention networks for retrieval-based question answering and response prediction [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1806.00778.pdf.

[10]	BAO G C, WEI Y, SUN X, et al Double attention recurrent convolution neural network for answer selection[J]. Royal Society Open Science, 2020, 7: 191517 doi: 10.1098/rsos.191517

[11]	江龙泉. 基于Attentive LSTM网络模型的答案匹配技术的研究[D]. 上海: 上海师范大学, 2018. JIANG Long-quan. Research on answer matching technology based on Attentive LSTM network model [D]. Shanghai: Shanghai Normal University, 2018.

[12]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1706.03762.pdf.

[13]	YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1804.09541.pdf.

[14]	SHAO T H, GUO Y P, CHEN H H, et al Transformer-based neural network for answer selection in question answering[J]. IEEE Access, 2019, 7: 26146- 26156 doi: 10.1109/ACCESS.2019.2900753

[15]	PETER M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. [S.l.]: ACL, 2018: 2227–2237.

[16]	RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [R/OL]. [2022-01-07]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.

[17]	DEBLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. [S. l.]: ACL, 2019: 4171–4186.

[18]	BROMLEY J, BENTZ J W, BOTTOU L, et al Signature verification using a “Siamese” time delay neural network[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7 (4): 669- 688 doi: 10.1142/S0218001493000339

[19]	HOCHREITER S, SCHMIDHUB J Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780 doi: 10.1162/neco.1997.9.8.1735

[20]	俞海亮, 彭冬亮, 谷雨结合双层多头自注意力和BiLSTM-CRF的军事武器实体识别[J]. 无线电工程, 2022, 52 (5): 775- 782 YU Hai-liang, PENG Dong-liang, GU Yu Military weapon entity recognition combined with double-layer multi-head self-attention and BiLSTM-CRF[J]. Radio Engineering, 2022, 52 (5): 775- 782 doi: 10.3969/j.issn.1003-3106.2022.05.011

[21]	BIAN W J, LI S, YANG Z, et al. A compare-aggregate model with dynamic-clip attention for answer selection [C]// Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. [S.l.]: ACM, 2017: 1987-1990.

[22]	YIN W P, SCHÜTZE H, XIANG B, et al ABCNN: attention-based convolutional neural network for modeling sentence pairs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 259- 272 doi: 10.1162/tacl_a_00097

[23]	LECUN Y, CHOPRA S, HADSELLl R, et al. A tutorial on energy-based learning [EB/OL]. [2022-01-07]. https://typeset.io/pdf/a-tutorial-on-energy-based-learning-2fj3lvviwy.pdf.

[24]	YANG Y, YIH W T, MEEK C. WikiQA: a challenge dataset for open-domain question answering [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2015: 2013-2018

[25]	WANG M Q, SMITH N A, MITAMURA T. What is the jeopardy model? A quasi-synchronous grammar for QA [C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. [S. l.]: ACL, 2007: 22-32.

[26]	FENG M W, XIANG B, GLASS M R, et al. Applying deep learning to answer selection: a study and an open task [EB/OL]. [2022-01-07]. https://arxiv.org/pdf/1508.01585.pdf.

[27]	PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation [C]// Proceeding of the 2014 Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACL, 2014: 1532-1543.

[1]	Kun HAO,Kuo WANG,Bei-bei WANG. Lightweight underwater biological detection algorithm based on improved Mobilenet-YOLOv3[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1622-1632.

[2]	Ren-peng MO,Xiao-sheng SI,Tian-mei LI,Xu ZHU. Bearing life prediction based on multi-scale features and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1447-1456.

[3]	You-wei WANG,Shuang TONG,Li-zhou FENG,Jian-ming ZHU,Yang LI,Fu CHEN. New inductive microblog rumor detection method based on graph convolutional network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 956-966.

[4]	Xiao-chen JU,Xin-xin ZHAO,Sheng-sheng QIAN. Self-attention mechanism based bridge bolt detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 901-908.

[5]	Xue-qin ZHANG,Tian-ren LI. Breast cancer pathological image classification based on Cycle-GAN and improved DPN network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 727-735.

[6]	Meng XU,Dan WANG,Zhi-yuan LI,Yuan-fang CHEN. IncepA-EEGNet: P300 signal detection method based on fusion of Inception network and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 745-753, 782.

[7]	Chang-yuan LIU,Xian-ping HE,Xiao-jun BI. Efficient network vehicle recognition combined with attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 775-782.

[8]	Qiao-hong CHEN,Hao-lei PEI,Qi SUN. Image caption based on relational reasoning and context gate mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 542-549.

[9]	Yuan-jun NONG,Jun-jie WANG,Hong CHEN,Wen-han SUN,Hui GENG,Shu-yue LI. A image caption method of construction scene based on attention mechanism and encoding-decoding architecture[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 236-244.

[10]	Ying-li LIU,Rui-gang WU,Chang-hui YAO,Tao SHEN. Construction method of extraction dataset of Al-Si alloy entity relationship[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 245-253.

[11]	Tian-le YUAN,Ju-long YUAN,Yong-jian ZHU,Han-chen ZHENG. Surface defect detection algorithm of thrust ball bearing based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2349-2357.

[12]	Nan-jing YU,Xiao-biao FAN,Tian-min DENG,Guo-tao MAO. Ship detection algorithm in complex backgrounds via multi-head self-attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2392-2402.

[13]	Fei LI,Kun HU,Yong ZHANG,Wen-shan WANG,Hao JIANG. Multi-dimensional detection of longitudinal tearing of conveyor belt based on YOLOv4 of hybrid domain attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2156-2167.

[14]	Kai LI,Yu-shun LIN,Xiao-lin WU,Fei-yu LIAO. Small target vehicle detection based on multi-scale fusion technology and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2241-2250.

[15]	Xiang-dong PENG,Cong-cheng PAN,Ze-jun KE,Hua-qiang ZHU,Xiao ZHOU. Classification method for electrocardiograph signals based on parallel architecture model and spatial-temporal attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1912-1923.

Viewed

Full text

Abstract

Cited

Shared

Discussed