Journal of ZheJiang University (Engineering Science)  2021, Vol. 55 Issue (12): 2352-2358    DOI: 10.3785/j.issn.1008-973X.2021.12.015
Text matching model based on dense connection networkand multi-dimensional feature fusion
Yue-lin CHEN1(),Wen-jing TIAN1,Xiao-dong CAI2,*(),Shu-ting ZHENG2
1. School of Mechanical and Electrical Engineering, Guilin University of Electronic Technology, Guilin 541004, China
2. School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
A text matching method was proposed based on the dense connection network and the multi-dimensional feature fusion, aiming at the problems of the semantic loss and insufficient information on the interaction for sentence pairs in the text matching process. The BiLSTM network was used to encode the sentence in order to obtain the semantic features of the sentence in the encoding end of the model. The word embedding feature at the bottom and the dense module feature at the top were connected by the dense connection network, and the semantic features of sentences were enriched. The similarity features, the difference features and the key features of sentence pairs were fused with multi-dimensional features based on the information interaction of word-level for attention mechanism, and large amounts of the semantic relationships between sentence pairs were captured by the model. The model evaluation was performed on four benchmark datasets. Compared with other strong benchmark models, the text matching accuracy of the proposed model was significantly improved by 0.3%, 0.3%, 0.6% and 1.81%, respectively. The validity verification experiment on the Quora dataset of paraphrase recognition showed that the proposed method had an accurate matching effect on the semantic similarity of sentences.

Key wordssemantic loss      information interaction      BiLSTM network      dense connection network      attention mechanism      multi-dimensional feature fusion     
Received: 22 March 2021      Published: 31 December 2021
CLC:  TP 391.1  
Fund:  广西科技重大专项(AA20302001);桂林市科学研究与技术开发技术课题(20190412)
Corresponding Authors: Xiao-dong CAI     E-mail:;
Yue-lin CHEN,Wen-jing TIAN,Xiao-dong CAI,Shu-ting ZHENG. Text matching model based on dense connection networkand multi-dimensional feature fusion. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2352-2358.

针对文本匹配过程中存在语义损失和句子对间信息交互不充分的问题,提出基于密集连接网络和多维特征融合的文本匹配方法. 模型的编码端使用BiLSTM网络对句子进行编码,获取句子的上下文语义特征;密集连接网络将最底层的词嵌入特征和最高层的密集模块特征连接,丰富句子的语义特征;基于注意力机制单词级的信息交互,将句子对间的相似性特征、差异性特征和关键性特征进行多维特征融合,使模型捕获更多句子对间的语义关系. 在4个基准数据集上对模型进行评估,与其他强基准模型相比,所提模型的文本匹配准确率显著提升,准确率分别提高0.3%、0.3%、0.6%和1.81%. 在释义识别Quora数据集上的有效性验证实验结果表明,所提方法对句子语义相似度具有精准的匹配效果.

关键词: 语义损失,  信息交互,  BiLSTM网络,  密集连接网络,  注意力机制,  多维特征融合 
Fig.1 DCN-MDFF model frame structure diagram
数据集 分类 数量 例句 标签
SNLI train 549367 p: a man playing an electric guitar on stage.
q: a man playing guitar on stage.
dev 9842
test 9824
SciTail train 23596 p: He grabs at the wheel to turn the car.
q: The turning driveshaft causes the wheels of the car to turn.
dev 1304
test 2126
Quora train 384348 p: What is the best way of living life?
q: What is the best way to live a life?
dev 10000
test 10000
蚂蚁金融 train 92500 p: 蚂蚁借呗多长时间可以审核通过?
q: 借呗申请多久可以审核通过?
dev 4000
test 4000
Tab.1 Size and examples of different data sets
Fig.2 Comparison results of matching accuracy with different models on SNLI dataset
Fig.3 Comparison results of matching accuracy with different models on SciTail dataset
Fig.4 Comparison results of matching accuracy with different models on Quora dataset
Fig.5 Comparison results of matching accuracy with different models on ant financial data set
模型 Acc/% 模型 Acc/%
KFF 89.6 SRC 89.3
DF 89.5 ARC 89.4
SimiF 89.2 DCN-MDFF 90.0
SF 89.2 ? ?
Tab.2 Results of ablation experiments on Quora dataset
Fig.6 Robustness experimental performance comparison on each verification set
