Please wait a minute...
浙江大学学报(工学版)  2021, Vol. 55 Issue (12): 2352-2358    DOI: 10.3785/j.issn.1008-973X.2021.12.015
计算机技术     
基于密集连接网络和多维特征融合的文本匹配模型
陈岳林1(),田文靖1,蔡晓东2,*(),郑淑婷2
1. 桂林电子科技大学 机电工程学院,广西 桂林 541004
2. 桂林电子科技大学 信息与通信学院,广西 桂林 541004
Text matching model based on dense connection networkand multi-dimensional feature fusion
Yue-lin CHEN1(),Wen-jing TIAN1,Xiao-dong CAI2,*(),Shu-ting ZHENG2
1. School of Mechanical and Electrical Engineering, Guilin University of Electronic Technology, Guilin 541004, China
2. School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
 全文: PDF(894 KB)   HTML
摘要:

针对文本匹配过程中存在语义损失和句子对间信息交互不充分的问题,提出基于密集连接网络和多维特征融合的文本匹配方法. 模型的编码端使用BiLSTM网络对句子进行编码,获取句子的上下文语义特征;密集连接网络将最底层的词嵌入特征和最高层的密集模块特征连接,丰富句子的语义特征;基于注意力机制单词级的信息交互,将句子对间的相似性特征、差异性特征和关键性特征进行多维特征融合,使模型捕获更多句子对间的语义关系. 在4个基准数据集上对模型进行评估,与其他强基准模型相比,所提模型的文本匹配准确率显著提升,准确率分别提高0.3%、0.3%、0.6%和1.81%. 在释义识别Quora数据集上的有效性验证实验结果表明,所提方法对句子语义相似度具有精准的匹配效果.

关键词: 语义损失信息交互BiLSTM网络密集连接网络注意力机制多维特征融合    
Abstract:

A text matching method was proposed based on the dense connection network and the multi-dimensional feature fusion, aiming at the problems of the semantic loss and insufficient information on the interaction for sentence pairs in the text matching process. The BiLSTM network was used to encode the sentence in order to obtain the semantic features of the sentence in the encoding end of the model. The word embedding feature at the bottom and the dense module feature at the top were connected by the dense connection network, and the semantic features of sentences were enriched. The similarity features, the difference features and the key features of sentence pairs were fused with multi-dimensional features based on the information interaction of word-level for attention mechanism, and large amounts of the semantic relationships between sentence pairs were captured by the model. The model evaluation was performed on four benchmark datasets. Compared with other strong benchmark models, the text matching accuracy of the proposed model was significantly improved by 0.3%, 0.3%, 0.6% and 1.81%, respectively. The validity verification experiment on the Quora dataset of paraphrase recognition showed that the proposed method had an accurate matching effect on the semantic similarity of sentences.

Key words: semantic loss    information interaction    BiLSTM network    dense connection network    attention mechanism    multi-dimensional feature fusion
收稿日期: 2021-03-22 出版日期: 2021-12-31
CLC:  TP 391.1  
基金资助: 广西科技重大专项(AA20302001);桂林市科学研究与技术开发技术课题(20190412)
通讯作者: 蔡晓东     E-mail: 370883566@qq.com;caixiaodong@guet.edu.cn
作者简介: 陈岳林(1963—),男,教授,从事自然语言处理,图像识别与处理研究. orcid.org/0000-0002-8377-3986. E-mail: 370883566@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
陈岳林
田文靖
蔡晓东
郑淑婷

引用本文:

陈岳林,田文靖,蔡晓东,郑淑婷. 基于密集连接网络和多维特征融合的文本匹配模型[J]. 浙江大学学报(工学版), 2021, 55(12): 2352-2358.

Yue-lin CHEN,Wen-jing TIAN,Xiao-dong CAI,Shu-ting ZHENG. Text matching model based on dense connection networkand multi-dimensional feature fusion. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2352-2358.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2021.12.015        https://www.zjujournals.com/eng/CN/Y2021/V55/I12/2352

图 1  DCN-MDFF框架结构图
数据集 分类 数量 例句 标签
SNLI train 549367 p: a man playing an electric guitar on stage.
q: a man playing guitar on stage.
蕴涵、中立、矛盾
dev 9842
test 9824
SciTail train 23596 p: He grabs at the wheel to turn the car.
q: The turning driveshaft causes the wheels of the car to turn.
蕴涵、中立
dev 1304
test 2126
Quora train 384348 p: What is the best way of living life?
q: What is the best way to live a life?
释义、未释义
dev 10000
test 10000
蚂蚁金融 train 92500 p: 蚂蚁借呗多长时间可以审核通过?
q: 借呗申请多久可以审核通过?
是、否
dev 4000
test 4000
表 1  不同数据集的大小和示例
图 2  SNLI数据集上不同模型的匹配准确率对比结果
图 3  SciTail数据集上不同模型的匹配准确率对比结果
图 4  Quora数据集上不同模型的匹配准确率对比结果
图 5  蚂蚁金融数据集上不同模型的匹配准确率对比结果
模型 Acc/% 模型 Acc/%
KFF 89.6 SRC 89.3
DF 89.5 ARC 89.4
SimiF 89.2 DCN-MDFF 90.0
SF 89.2 ? ?
表 2  Quora数据集上消融实验结果
图 6  在各个验证集上的鲁棒性实验性能对比
1 张鹏飞, 李冠宇, 贾彩燕 面向自然语言推理的基于截断高斯距离的自注意力机制[J]. 计算机科学, 2020, 47 (4): 178- 183
ZHANG Peng-fei, LI Guan-yu, JIA Cai-yan Truncated Gaussian distance-based self-attention mechanism for natural language inference[J]. Computer Science, 2020, 47 (4): 178- 183
doi: 10.11896/jsjkx.190600149
2 BOWMAN S R, ANGEL G, POTTS C, et al. A large annotated corpus for learning natural language inference [C]// 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: EMNLP, 2015: 632–642.
3 KHOT T, SABHARWAL A, CLARK P. SCITAIL: a textual entailment dataset from science question answering [C]// The Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 5189-5197.
4 WANG S, JING J. A compare-aggregate model for matching text sequences [C]// 5th International Conference on Learning Representations. Toulon: ICLR, 2017: 1-11.
5 YANG Y , YIH W T, MEEK C. WikiQA: a challenge dataset for open-domain question answering [C]// 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: EMNLP, 2015: 2013–2018.
6 RAO J, YANG W, ZHANG Y, et al. Multi-perspective relevance matching with hierarchical ConvNets for social media search [C]// . The 33rd AAAI Conference on Artificial Intelligence. Hawaii: AAAI, 2019: 232-240.
7 DUAN C Q, CUI L, CHEN X C, et al. Attention-fused deep matching network for natural language inference [C]// 2018 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI, 2018: 4033–4040.
8 WANG Z G, HAMZA W, FLORIAN R. Bilateral multi-perspective matching for natural language sentences [C]// 2017 26th International Joint Conference on Artificial Intelligence. Melbourne: IJCAI, 2017: 4144-4150.
9 CONNEAU A, KIELA D, SCHWENK H, et al. Supervised learning of universal sentence representations from natural language inference data [C]// 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen: EMNLP, 2017: 670-680.
10 NIE Y, BANSAL M. Shortcut-stacked sentence encoders for multi-domain inference [C]// 2017 2nd Workshop on Evaluating Vector Space Representations for NLP. Copenhagen: EMNLP, 2017: 41-45.
11 TAO S, ZHOU T, LONG G, et al. Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling [C]// 2018 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI, 2018: 4345-4352.
12 WANG B, LIU K, ZHAO J. Inner attention based recurrent neural networks for answer selection [C]// 2016 54th Annual Meeting of the Association for Computational Linguistics. Berlin: ACL, 2016: 1288-1297.
13 TAY Y, LUU A, HUI S C. Hermitian co-attention networks for text matching in asymmetrical domains [C]// 2018 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI, 2018: 4425–4431.
14 YANG R, ZHANG J, GAO X, et al. Simple and effective text matching with richer alignment features [C]// 2019 57th Conference of the Association for Computational Linguistics. Florence: ACL, 2019: 4699-4709.
15 HUANG G, LIU Z, MAATEN L V D, et al. Densely connected convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261-2269.
16 GERS F A, SCHMIDHUBER E LSTM recurrent networks learn simple context-free and context-sensitive languages[J]. IEEE Transactions on Neural Networks, 2001, 12 (6): 1333- 1340
doi: 10.1109/72.963769
17 PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation [C]// 2014 Conference on Empirical Methods in Natural Language Processing. Doha: EMNLP, 2014: 1532-1543.
18 COLLOBERT R, WESTON J, BOTTOU L, et al Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12 (1): 2493- 2537
19 PARIKH A P, TÄCKSTRÖM O, DAS D, et al. A decomposable attention model for natural language inference [C]// In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin: EMNLP, 2016: 2249–2255.
20 SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15 (1): 1929- 1958
21 GAO Y, CHANG H J, DEMIRIS Y. Iterative path optimisation for personalised dressing assistance using vision and force information [C]// 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon: IEEE, 2016: 4398-4403.
22 LIU X , DUH K , GAO J . Stochastic answer networks for natural language inference [EB/OL]. [2021-03-13]. https://arxiv.org/abs/1804.07888.
23 PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]// 2018 Conference of the North American Chapter of the Association for Computational Linguistics. [S.l.]: NAACL-HLT, 2018: 2227-2237.
24 YI T, LUU A T, HUI S C. Compare, compress and propagate: enhancing neural architectures with alignment factorization for natural language inference [C]// 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: EMNLP, 2018: 1565-1575.
25 LIU M , ZHANG Y , XU J , et al. Original semantics-oriented attention and deep fusion network for sentence matching [C]// 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing . Hong Kong: EMNLP-IJCNLP, 2019: 2652-2661.
26 TAY Y , TUAN L A , HUI S C. Co-stack residual affinity networks with multi-level attention refinement for matching text sequences [C]// 2018 Conference on Empirical Methods in Natural Language Processing, Brussels : EMMLP, 2018: 4492–4502.
[1] 鞠晓臣,赵欣欣,钱胜胜. 基于自注意力机制的桥梁螺栓检测算法[J]. 浙江大学学报(工学版), 2022, 56(5): 901-908.
[2] 王友卫,童爽,凤丽洲,朱建明,李洋,陈福. 基于图卷积网络的归纳式微博谣言检测新方法[J]. 浙江大学学报(工学版), 2022, 56(5): 956-966.
[3] 张雪芹,李天任. 基于Cycle-GAN和改进DPN网络的乳腺癌病理图像分类[J]. 浙江大学学报(工学版), 2022, 56(4): 727-735.
[4] 许萌,王丹,李致远,陈远方. IncepA-EEGNet: 融合Inception网络和注意力机制的P300信号检测方法[J]. 浙江大学学报(工学版), 2022, 56(4): 745-753, 782.
[5] 柳长源,何先平,毕晓君. 融合注意力机制的高效率网络车型识别[J]. 浙江大学学报(工学版), 2022, 56(4): 775-782.
[6] 陈巧红,裴皓磊,孙麒. 基于视觉关系推理与上下文门控机制的图像描述[J]. 浙江大学学报(工学版), 2022, 56(3): 542-549.
[7] 农元君,王俊杰,陈红,孙文涵,耿慧,李书悦. 基于注意力机制和编码-解码架构的施工场景图像描述方法[J]. 浙江大学学报(工学版), 2022, 56(2): 236-244.
[8] 刘英莉,吴瑞刚,么长慧,沈韬. 铝硅合金实体关系抽取数据集的构建方法[J]. 浙江大学学报(工学版), 2022, 56(2): 245-253.
[9] 董红召,方浩杰,张楠. 旋转框定位的多尺度再生物品目标检测算法[J]. 浙江大学学报(工学版), 2022, 56(1): 16-25.
[10] 王鑫,陈巧红,孙麒,贾宇波. 基于关系推理与门控机制的视觉问答方法[J]. 浙江大学学报(工学版), 2022, 56(1): 36-46.
[11] 陈智超,焦海宁,杨杰,曾华福. 基于改进MobileNet v2的垃圾图像分类算法[J]. 浙江大学学报(工学版), 2021, 55(8): 1490-1499.
[12] 雍子叶,郭继昌,李重仪. 融入注意力机制的弱监督水下图像增强算法[J]. 浙江大学学报(工学版), 2021, 55(3): 555-562.
[13] 陈涵娟,达飞鹏,盖绍彦. 基于竞争注意力融合的深度三维点云分类网络[J]. 浙江大学学报(工学版), 2021, 55(12): 2342-2351.
[14] 辛文斌,郝惠敏,卜明龙,兰媛,黄家海,熊晓燕. 基于ShuffleNetv2-YOLOv3模型的静态手势实时识别方法[J]. 浙江大学学报(工学版), 2021, 55(10): 1815-1824.
[15] 刘创,梁军. 基于注意力机制的车辆运动轨迹预测[J]. 浙江大学学报(工学版), 2020, 54(6): 1156-1163.