Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (2): 379-387    DOI: 10.3785/j.issn.1008-973X.2026.02.016
计算机技术与控制工程     
基于嵌入特征和稀疏矩阵的实体对齐方法
冯超文1,2(),耿程晨1,2,刘英莉1,2,*()
1. 昆明理工大学 信息工程与自动化学院,云南 昆明 650500
2. 昆明理工大学 云南省计算机技术应用重点实验室,云南 昆明 650500
Entity alignment method based on embedding features and sparse matrices
Chaowen FENG1,2(),Chengchen GENG1,2,Yingli LIU1,2,*()
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
2. Yunnan Key Laboratory of Computer Technology Applications, Kunming University of Science and Technology, Kunming 650500, China
 全文: PDF(1200 KB)   HTML
摘要:

多语言知识融合的实体对齐面临特征建模粒度不足、结构信息利用受限的挑战,为此提出融合多层次嵌入特征与稀疏矩阵传播机制的实体对齐方法. 结合字符特征、词向量特征与邻域关系特征,构建统一的多维实体表示,增强实体的局部语义表达和结构关联建模能力. 基于关系嵌入构建稀疏邻接矩阵,结合特征归一化传播机制,实现信息在知识图谱中的稳定扩展与有效传递. 为了进一步提升实体匹配的全局一致性,引入Sinkhorn正则化优化相似度矩阵,采用Hungarian算法执行最优实体对齐. 所提方法在多个跨语言知识图谱数据集上的命中率和平均倒数排名评价指标上均有稳定性能表现,比代表性方法(如SNGA、EAMI)的竞争性强. 该结果有效验证了所提方法的准确性与鲁棒性.

关键词: 知识图谱实体对齐多层次特征建模稀疏矩阵传播Sinkhorn正则化    
Abstract:

Entity alignment for multilingual knowledge fusion suffers from insufficient granularity in feature modeling and limited exploitation of structural information. An entity alignment method was proposed that integrated multi-level embedding features with a sparse matrix propagation mechanism. Entities were represented through a unified embedding that fused character-level features, word-level embeddings, and neighborhood relational information, enabling fine-grained semantic and structural expression. To promote efficient knowledge propagation, a sparse adjacency matrix was constructed based on relation embeddings, and a normalization-based mechanism was introduced to stabilize feature transmission across graphs. To enhance global consistency during alignment, Sinkhorn regularization was applied to refine the similarity matrix, followed by the Hungarian algorithm to obtain optimal one-to-one matching. Stable performance was achieved on multiple cross-lingual knowledge graph datasets in terms of evaluation metrics such as hit rate and mean reciprocal rank. Compared with representative methods such as SNGA and EAMI, the proposed approach demonstrated strong competitiveness, validating its accuracy and robustness.

Key words: knowledge graph    entity alignment    multi-level feature modeling    sparse matrix propagation    Sinkhorn regularization
收稿日期: 2025-03-06 出版日期: 2026-02-03
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(52061020);云南省重大科技专项计划项目(202302AG050009);云南省计算机技术应用重点实验室开放基金资助项目(2024G05).
通讯作者: 刘英莉     E-mail: 15236085295@163.com;lyl@kust.edu.cn
作者简介: 冯超文(1995—),男,硕士生,从事知识图谱研究. orcid.org/0009-0004-7851-7509. E-mail:15236085295@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
冯超文
耿程晨
刘英莉

引用本文:

冯超文,耿程晨,刘英莉. 基于嵌入特征和稀疏矩阵的实体对齐方法[J]. 浙江大学学报(工学版), 2026, 60(2): 379-387.

Chaowen FENG,Chengchen GENG,Yingli LIU. Entity alignment method based on embedding features and sparse matrices. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 379-387.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.02.016        https://www.zjujournals.com/eng/CN/Y2026/V60/I2/379

图 1  知识图谱实体对齐示例
图 2  融合多层次嵌入特征与稀疏矩阵传播机制的实体对齐方法整体架构
数据集语言NENRNT
DBP_ZH-EN
中文19 3881 70170 414
英文19 5721 32395 142
DBP_JA-EN
日文19 8141 29977 214
英文19 7801 15393 484
DBP_FR-EN法文19 661903105 998
英文19 9931 208115 722
表 1  DBP15K数据集的信息统计
方法DBP_ZH-ENDBP_JA-ENDBP_FR-EN
Hits@1Hits@10MRRHits@1Hits@10MRRHits@1Hits@10MRR
MTransE[16]0.2090.5120.3100.2500.5720.3600.2470.5770.360
GCN-Align[17]0.4340.7620.5500.4270.7620.5400.4110.7720.530
MuGNN[18]0.4940.8440.6110.5010.8570.6210.4950.8700.621
BootEA[19]0.6290.8470.7030.6220.8530.7010.6530.8740.731
PSR[20]0.8020.9350.8510.8030.9380.8520.8280.9520.874
MRAEA[21]0.7570.9300.8270.7580.9340.8260.7810.9480.849
AttrGNN[22]0.7960.9290.8450.7830.9200.8340.9190.9790.910
RDGCN[23]0.6970.8420.7500.7630.8970.8100.8730.9500.901
HGCN[24]0.7200.8570.7600.7660.8970.8120.8920.9610.910
JEANS[25]0.7190.8950.7910.7370.9140.7980.7690.9400.827
EPEA[26]0.8850.9530.9110.9240.9690.9420.9550.9860.967
SNGA[27]0.9870.9970.9910.9910.9980.9940.9981.0000.999
EAMI[28]0.9350.9820.9500.9390.9780.9500.9870.9960.990
本研究0.8710.9500.9000.9380.9820.9550.9760.9950.984
表 2  不同实体对齐方法在DBP15K数据集上的性能对比
图 3  DBP15K各子集的训练损失收敛曲线
方法变体DBP_ZH-ENDBP_JA-ENDBP_FR-EN
Hits@1Hits@10MRRHits@1Hits@10MRRHits@1Hits@10MRR
完整模型 0.871 0.950 0.900 0.938 0.982 0.955 0.976 0.995 0.984
移除稀缺特征传播模块0.8360.9320.8780.9020.9710.9300.9420.9850.962
移除关系嵌入模块0.8190.9240.8670.8890.9580.9230.9340.9820.958
移除字符级嵌入模块0.8500.9400.8900.9210.9750.9420.9600.9900.975
移除Sinkhorn 归一化模块0.8450.9370.8850.9180.9730.9400.9560.9880.973
移除Hungarian算法模块0.7900.9100.8400.8640.9460.9050.9180.9750.948
表 3  所提实体对齐方法不同模块的消融实验结果
图 4  不同温度下DBP_ZH-EN数据集的Hits@1变化情况
图 5  不同传播深度下DBP15K各子集上的Hits@1变化情况
图 6  不同负样本三元组数量下的Hits@1和Hits@10
1 CHEN X, JIA S, XIANG Y A review: knowledge reasoning over knowledge graph[J]. Expert Systems with Applications, 2020, 141: 112948
doi: 10.1016/j.eswa.2019.112948
2 ZHAO X, ZENG W, TANG J, et al An experimental study of state-of-the-art entity alignment approaches[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34 (6): 2610- 2625
3 FU T C, CHUNG F L, LUK R, et al Stock time series pattern matching: template-based vs. rule-based approaches[J]. Engineering Applications of Artificial Intelligence, 2007, 20 (3): 347- 364
doi: 10.1016/j.engappai.2006.07.003
4 CHANDRASEKARAN D, MAGO V Evolution of semantic similarity: a survey[J]. ACM Computing Surveys, 2021, 54 (2): 1- 37
5 HERRMANN L, KOLLMANNSBERGER S Deep learning in computational mechanics: a review[J]. Computational Mechanics, 2024, 74 (2): 281- 331
doi: 10.1007/s00466-023-02434-4
6 CORSO G, STARK H, JEGELKA S, et al Graph neural networks[J]. Nature Reviews Methods Primers, 2024, 4: 17
doi: 10.1038/s43586-024-00294-7
7 QAISER S, ALI R Text mining: use of TF-IDF to examine the relevance of words to documents[J]. International Journal of Computer Applications, 2018, 181 (1): 25- 29
doi: 10.5120/ijca2018917395
8 COHEN W W, RICHMAN J. Learning to match and cluster large high-dimensional data sets for data integration [C]// Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton: ACM, 2002: 475–480.
9 BUSCALDI D, ROSSO P, GÓMEZ-SORIANO J M, et al Answering questions with an n-gram based passage retrieval engine[J]. Journal of Intelligent Information Systems, 2010, 34 (2): 113- 134
doi: 10.1007/s10844-009-0082-y
10 SARAWAGI S, BHAMIDIPATY A. Interactive deduplication using active learning [C]// Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton: ACM, 2002: 269–278.
11 ARASU A, GÖTZ M, KAUSHIK R. On active learning of record matching packages [C]// Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. Indianapolis: ACM, 2010: 783–794.
12 JEAN-MARY Y R, SHIRONOSHITA E P, KABUKA M R. ASMOV: results for OAEI 2010 [C]// Proceedings of the 5th International Workshop on Ontology Matching (OM 2010). Shanghai: [s.n.], 2010: 114−121.
13 SUCHANEK F M, ABITEBOUL S, SENELLART P. PARIS: probabilistic alignment of relations, instances, and schema [EB/OL]. (2011−11−30)[2025−03−05]. https://arxiv.org/pdf/1111.7164.
14 LACOSTE-JULIEN S, PALLA K, DAVIES A, et al. SiGMa: simple greedy matching for aligning large knowledge bases [C]// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago: ACM, 2013: 572−580.
15 SONG D, LUO Y, HEFLIN J Linking heterogeneous data in the semantic web using scalable and domain-independent candidate selection[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29 (1): 143- 156
doi: 10.1109/TKDE.2016.2606399
16 CHEN M, TIAN Y, YANG M, et al. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment [EB/OL]. (2017−05−17)[2025−03−05]. https://arxiv.org/pdf/1611.03954.
17 FEY M, LENSSEN J E, MORRIS C, et al. Deep graph matching consensus [EB/OL]. (2020−01−27)[2025−03−05]. https://arxiv.org/pdf/2001.09621.
18 CAO Y, LIU Z, LI C, et al. Multi-channel graph neural network for entity alignment [EB/OL]. (2019−08−26)[2025−03−05]. https://arxiv.org/pdf/1908.09898.
19 SUN Z, HU W, ZHANG Q, et al. Bootstrapping entity alignment with knowledge graph embedding [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: ACM, 2018: 4396–4402.
20 MAO X, WANG W, WU Y, et al. Are negative samples necessary in entity alignment? An approach with high performance, scalability and robustness [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management. [S.l.]: ACM, 2021: 1263−1273.
21 MAO X, WANG W, XU H, et al. MRAEA: an efficient and robust entity alignment approach for cross-lingual knowledge graph [C]// Proceedings of the 13th International Conference on Web Search and Data Mining. Houston: ACM, 2020: 420−428.
22 LIU Z, CAO Y, PAN L, et al. Exploring and evaluating attributes, values, and structures for entity alignment [EB/OL]. (2021−01−02)[2025−03−05]. https://arxiv.org/pdf/2010.03249.
23 WU Y, LIU X, FENG Y, et al. Relation-aware entity alignment for heterogeneous knowledge graphs [EB/OL]. (2019−08−22)[2025−03−05]. https://arxiv.org/pdf/1908.08210.
24 WU Y, LIU X, FENG Y, et al. Jointly learning entity and relation representations for entity alignment [EB/OL]. (2019−09−20)[2025−03−05]. https://arxiv.org/pdf/1909.09317.
25 CHEN M, SHI W, ZHOU B, et al. Cross-lingual entity alignment with incidental supervision [EB/OL]. (2021−01−26)[2025−03−05]. https://arxiv.org/pdf/2005.00171.
26 WANG Z, YANG J, YE X. Knowledge graph alignment with entity-pair embedding [C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. [S.l.]: ACL, 2020: 1672−1680.
27 TANG J, ZHAO K, LI J. A fused Gromov-Wasserstein framework for unsupervised knowledge graph entity alignment [EB/OL]. (2023−05−11)[2025−03−05]. https://arxiv.org/pdf/2305.06574.
28 ZHAO Y, WU Y, CAI X, et al. From alignment to entailment: a unified textual entailment framework for entity alignment [C]// Findings of the Association for Computational Linguistics. Toronto: ACL, 2023: 8795−8806.
29 PATRINI G, VAN DEN BERG R, FORRE P, et al. Sinkhorn autoencoders [C]// 35th Uncertainty in Artificial Intelligence Conference. Toronto: PMLR, 2020: 733−743.
30 HAMUDA E, MC GINLEY B, GLAVIN M, et al Improved image processing-based crop detection using Kalman filtering and the Hungarian algorithm[J]. Computers and Electronics in Agriculture, 2018, 148: 37- 44
doi: 10.1016/j.compag.2018.02.027
31 GRANGER S, BESTGEN Y The use of collocations by intermediate vs. advanced non-native writers: a bigram-based study[J]. International Review of Applied Linguistics in Language Teaching, 2014, 52 (3): 229- 252
[1] 谢涛,葛慧丽,陈宁,汪晓锋,李延松,黄晓峰. 知识嵌入增强的对比推荐模型[J]. 浙江大学学报(工学版), 2026, 60(1): 90-98.
[2] 李忠良,陈麒,石琳,杨朝,邹先明. 时间感知组合的动态知识图谱补全[J]. 浙江大学学报(工学版), 2024, 58(8): 1738-1747.
[3] 李劲业,李永强. 融合知识图谱的时空多图卷积交通流量预测[J]. 浙江大学学报(工学版), 2024, 58(7): 1366-1376.
[4] 何勇禧,韩虎,孔博. 基于多依赖图和知识融合的方面级情感分析模型[J]. 浙江大学学报(工学版), 2024, 58(4): 737-747.
[5] 李松,王哲,张丽平. SL-tgStore:新的时序知识图谱存储模型[J]. 浙江大学学报(工学版), 2024, 58(3): 449-458.
[6] 王慧欣,童向荣. 融合知识图谱的推荐系统研究进展[J]. 浙江大学学报(工学版), 2023, 57(8): 1527-1540.
[7] 李松,舒世泰,郝晓红,郝忠孝. 融合文本描述和层次类型的知识表示学习方法[J]. 浙江大学学报(工学版), 2023, 57(5): 911-920.
[8] 邢雪琪,丁雨童,夏唐斌,潘尔顺,奚立峰. 基于知识图谱的商用飞机维修方案推荐系统集成建模[J]. 浙江大学学报(工学版), 2023, 57(3): 512-521.
[9] 苏丰龙,景宁. 基于关系聚合的时序知识图谱表示学习[J]. 浙江大学学报(工学版), 2023, 57(2): 235-242.
[10] 林栋,李永强,仇翔,冯远静,谢碧峰. 结合静态事实和重复历史事实的动态知识图谱推理方法[J]. 浙江大学学报(工学版), 2023, 57(10): 1915-1922.
[11] 凤丽洲,杨阳,王友卫,杨贵军. 基于Transformer和知识图谱的新闻推荐新方法[J]. 浙江大学学报(工学版), 2023, 57(1): 133-143.
[12] 陈成,张皞,李永强,冯远静. 关系生成图注意力网络的知识图谱链接预测[J]. 浙江大学学报(工学版), 2022, 56(5): 1025-1034.