大模型知识引导的复合多注意力文档级关系抽取方法

doi:10.3785/j.issn.1008-973X.2025.09.003

浙江大学学报(工学版)

2025, Vol. 59

Issue (9): 1793-1802 DOI: 10.3785/j.issn.1008-973X.2025.09.003

计算机技术

大模型知识引导的复合多注意力文档级关系抽取方法

竹志超1(

),李建强1,齐宏智1,赵青1,*(

),高齐2,李思颖2,蔡嘉怡2,沈金炎2

1. 北京工业大学计算机学院，北京 100124
2. 北京工业大学北京-都柏林国际学院，北京 100124

Large model knowledge-guided composite multi-attention method for document-level relation extraction

Zhichao ZHU1(

),Jianqiang LI1,Hongzhi QI1,Qing ZHAO1,*(

),Qi GAO2,Siying LI2,Jiayi CAI2,Jinyan SHEN2

1. College of Computer Science, Beijing University of Technology, Beijing 100124, China
2. Beijing-Dublin International College, Beijing University of Technology, Beijing 100124, China

全文: PDF(1235 KB) HTML

摘要：

针对现有文档级关系抽取（DRE）方法对各类语义信息内部特征的重要性区分不足以及外部领域知识规模受限、实时扩展困难的问题，提出大语言模型知识引导的复合多注意力（LKCM）方法. 通过集成复合多注意力框架，利用注意力机制对词、句和文档级特征进行细致提取，有效区分不同语义信息内部特征的重要性；将大语言模型微调为动态领域知识库组件，借助其广泛的常识性知识和强大的推理能力，持续为模型提供知识指导，有效缓解知识规模有限和难以实时扩展的问题. 在真实医学关系数据集上的实验结果表明，LKCM在F1指标上的平均值超出最佳基线方法1.54个百分点. 该方法显著提高了长距离跨句关系的捕捉能力，增强了对关键特征的辨识效果，具备较好的性能和推广价值.

关键词： 文档级关系抽取; 领域知识; 注意力; 大语言模型; 常识推理

Abstract:

A large language model knowledge-guided composite multi-attention (LKCM) method was proposed to address the shortcomings in current document-level relation extraction (DRE) methods, namely, the insufficient differentiation of internal feature importance in various semantic information and the limited, hard-to-expand scale of external domain knowledge. By integrating a composite multi-attention framework, the attention mechanism was utilized to meticulously extract features at the word, sentence, and document levels to effectively distinguish the varying importance of internal features across different semantic information. A large language model was fine-tuned as a dynamic domain knowledge base component and its extensive commonsense knowledge and reasoning capabilities were leveraged to continuously provide guidance for the model. This design effectively mitigates the issues of limited knowledge scale and difficult real-time expansion. Experimental results on a real-world medical relation dataset showed that the average F1 score of the LKCM was 1.54 percentage points higher than that of the best baseline. The comprehensive analysis demonstrated that this method not only enhanced the capture of long-distance, cross-sentence relations but also improved the identification of key features. The LKCM method exhibits strong performance and broad applicability.

Key words: document-level relation extraction domain knowledge attention large language model common sense reasoning

收稿日期: 2024-09-25 出版日期: 2025-08-25

CLC:

TP 393

基金资助: 国家科学基金联合基金资助项目（U20A2018）；北京市卫生健康委员会高级公共卫生技术人才建设项目（领军人才03-10）.

通讯作者: 赵青 E-mail: zhuzc@emails.bjut.edu.cn;zhaoqing@bjut.edu.cn

作者简介: 竹志超（1994—），男，博士生，从事自然语言处理、医学人工智能研究. orcid.org/0000-0002-1544-8831. E-mail：zhuzc@emails.bjut.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	竹志超
	李建强
	齐宏智
	赵青
	高齐
	李思颖
	蔡嘉怡
	沈金炎

引用本文:

竹志超,李建强,齐宏智,赵青,高齐,李思颖,蔡嘉怡,沈金炎. 大模型知识引导的复合多注意力文档级关系抽取方法[J]. 浙江大学学报(工学版), 2025, 59(9): 1793-1802.

Zhichao ZHU,Jianqiang LI,Hongzhi QI,Qing ZHAO,Qi GAO,Siying LI,Jiayi CAI,Jinyan SHEN. Large model knowledge-guided composite multi-attention method for document-level relation extraction. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1793-1802.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.09.003 或 https://www.zjujournals.com/eng/CN/Y2025/V59/I9/1793

图 1 大语言模型知识引导的复合多注意力架构

表 1 与先进模型的性能对比结果

表 2 不同组件下模型的性能对比结果

表 3 LKCM和LKCM-1的注意力权重可视化结果

表 4 LKCM与基线模型的统计显著性分析结果

1	ZHAO Q, XU D, LI J, et al Knowledge guided distance supervision for biomedical relation extraction in Chinese electronic medical records[J]. Expert Systems with Applications, 2022, 204: 117606 doi: 10.1016/j.eswa.2022.117606
2	HEIST N, PAULHEIM H. Language-agnostic relation extraction from wikipedia abstracts [C]// International Semantic Web Conference. Vienna: Springer, 2017: 383–399.
3	ZENG D, LIU K, LAI S, et al. Relation classification via convolutional deep neural network [C]// International Conference on Computational Linguistics. Dublin: ACL, 2014: 2335–2344.
4	ZHANG Y, QI P, MANNING C D. Graph convolution over pruned dependency trees improves relation extraction [C]// Conference on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018: 2205–2215.
5	YAO Y, YE D, LI P, et al. DocRED: a large-scale document-level relation extraction dataset [C]// Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019: 764–777.
6	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [EB/OL]. (2016-09-09). https://arxiv.org/abs/1609.02907.
7	WANG D, HU W, CAO E, et al. Global-to-local neural networks for document-level relation extraction [C]// Conference on Empirical Methods in Natural Language Processing. [S.l.]: ACL, 2020: 3711–3721.
8	LIU H, KANG Z, ZHANG L, et al. Document-level relation extraction with cross-sentence reasoning graph [C]// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka: Springer, 2023: 316–328.
9	ZHOU H, XU Y, YAO W, et al. Global context-enhanced graph convolutional networks for document-level relation extraction [C]// International Conference on Computational Linguistics. Barcelona: ICCL, 2020: 5259–5270.
10	VRANDEČIĆ D, KRÖTZSCH M Wikidata: a free collaborative knowledgebase[J]. Communications of the ACM, 2014, 57 (10): 78- 85 doi: 10.1145/2629489
11	AUER S, BIZER C, KOBILAROV G, et al. DBpedia: a nucleus for a web of open data [C]// International Semantic Web Conference. Busan: Springer, 2007: 722–735.
12	BASTOS A, NADGERI A, SINGH K, et al. RECON: relation extraction using knowledge graph context in a graph neural network [C]// International World Wide Web Conference. Ljubljana: ACM, 2021: 1673–1685.
13	FERNÀNDEZ-CAÑELLAS D, MARCO RIMMEK J, ESPADALER J, et al. Enhancing online knowledge graph population with semantic knowledge [C]// International Semantic Web Conference. Athens: Springer, 2020: 183–200.
14	PAN J, ZHANG M, SINGH K, et al. Entity enabled relation linking [C]// International Semantic Web Conference. Auckland: Springer, 2019: 523–538.
15	WANG X, WANG Z, SUN W, et al. Enhancing document-level relation extraction by entity knowledge injection [C]// International Semantic Web Conference. [S.l.]: Springer, 2022: 39–56.
16	WANG H, QIN K, LU G, et al Document-level relation extraction using evidence reasoning on RST-GRAPH[J]. Knowledge-Based Systems, 2021, 228: 107274 doi: 10.1016/j.knosys.2021.107274
17	SOUSA D, COUTO F M Biomedical relation extraction with knowledge graph-based recommendations[J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26 (8): 4207- 4217 doi: 10.1109/JBHI.2022.3173558
18	CHEN J, HU B, PENG W, et al Biomedical relation extraction via knowledge-enhanced reading comprehension[J]. BMC Bioinformatics, 2022, 23 (1): 20 doi: 10.1186/s12859-021-04534-5
19	ZHANG B, LI L, SONG D, et al Biomedical event causal relation extraction based on a knowledge-guided hierarchical graph network[J]. Soft Computing, 2023, 27 (22): 17369- 17386 doi: 10.1007/s00500-023-08882-7
20	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding [EB/OL]. (2018-10-11). https://arxiv.org/abs/1810.04805v2.
21	WANG H, FOCKE C, SYLVESTER R, et al. Fine-tune BERT for DocRED with two-step process [EB/OL]. (2019-09-26). https://arxiv.org/abs/1909.11898v1.
22	ZHOU W, HUANG K, MA T, et al. Document-level relation extraction with adaptive thresholding and localized context pooling [C]// AAAI Conference on Artificial Intelligence. [S.l.]: AAAI Press, 2021: 14612–14620.
23	XU B, WANG Q, LYU Y, et al. Entity structure within and throughout: modeling mention dependencies for document-level relation extraction [C]// AAAI Conference on Artificial Intelligence. [S.l.]: AAAI Press, 2021: 14149–14157.
24	QUIRK C, POON H. Distant supervision for relation extraction beyond the sentence boundary [C]// Conference of the European Chapter of the Association for Computational Linguistics. Valencia: ACL, 2017: 1171–1182.
25	PENG N, POON H, QUIRK C, et al Cross-sentence N-ary relation extraction with graph LSTMs[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 101- 115 doi: 10.1162/tacl_a_00049
26	VERGA P, STRUBELL E, MCCALLUM A. Simultaneously self-attending to all mentions for full-abstract biological relation extraction [C]// Conference of the North American Chapter of the Association for Computational Linguistics. New Orleans: ACL, 2018: 872–884.
27	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// International Conference on Neural Information Processing Systems. Long Beach: NeurIPS Foundation, 2017: 6000–6010.
28	NAN G, GUO Z, SEKULIC I, et al. Reasoning with latent structure refinement for document-level relation extraction [C]// Annual Meeting of the Association for Computational Linguistics. [S.l.]: ACL, 2020: 1546–1557.
29	ZENG S, XU R, CHANG B, et al. Double graph based reasoning for document-level relation extraction [C]// Conference on Empirical Methods in Natural Language Processing. [S.l.]: ACL, 2020: 1630–1640.
30	LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach [EB/OL]. (2019-07-26). https://arxiv.org/abs/1907.11692.
31	ZENG A, XU B, WANG B, et al. ChatGLM: a family of large language models from GLM-130B to GLM-4 all tools. (2024-07-30). https://arxiv.org/abs/2406.12793.
32	OUYANG L, WU J, XU J, et al. Training language models to follow instructions with human feedback [C]// International Conference on Neural Information Processing Systems. New Orleans: NeurIPS Foundation, 2022: 27730–27744.
33	DUBEY A, JAUHR A, PANDEY A, et al. The Llama 3 herd of models [EB/OL]. (2024-07-31). https://arxiv.org/abs/2407.21783.
34	HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models [EB/OL]. (2021-10-16). https://arxiv.org/abs/2106.09685.
35	VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks [EB/OL]. (2017-10-30). https://arxiv.org/abs/1710.10903.

[1]	林宜山,左景,卢树华. 基于多头自注意力机制与MLP-Interactor的多模态情感分析[J]. 浙江大学学报(工学版), 2025, 59(8): 1653-1661.
[2]	付家瑞,李兆飞,周豪,黄惟. 基于Convnextv2与纹理边缘引导的伪装目标检测[J]. 浙江大学学报(工学版), 2025, 59(8): 1718-1726.
[3]	王盼蓉,贾海蓉,段淑斐. 基于卷积和门控注意的两阶段视听语音增强算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1662-1670.
[4]	翟亚红,陈雅玲,徐龙艳,龚玉. 改进YOLOv8s的轻量级无人机航拍小目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1708-1717.
[5]	张学军,梁书滨,白万荣,张奉鹤,黄海燕,郭梅凤,陈卓. 基于异构图表征的源代码漏洞检测方法[J]. 浙江大学学报(工学版), 2025, 59(8): 1644-1652.
[6]	杨荣泰,邵玉斌,杜庆治. 基于结构感知的少样本知识补全[J]. 浙江大学学报(工学版), 2025, 59(7): 1394-1402.
[7]	刘杰,吴优,田佳禾,韩轲. 改进Transformer的肺部CT图像超分辨率重建[J]. 浙江大学学报(工学版), 2025, 59(7): 1434-1442.
[8]	何婧瑶,李鹏飞,汪承志,吕振鸣,牟萍. 基于双目视觉和改进YOLOv8的动态三维重建方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1443-1450.
[9]	杨宇豪,郭永存,李德永,王爽. 基于视觉信息的煤矸识别分割定位方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1421-1433.
[10]	王圣举,张赞. 基于加速扩散模型的缺失值插补算法[J]. 浙江大学学报(工学版), 2025, 59(7): 1471-1480.
[11]	章东平,王大为,何数技,汤斯亮,刘志勇,刘中秋. 基于跨维度特征融合的航空发动机寿命预测[J]. 浙江大学学报(工学版), 2025, 59(7): 1504-1513.
[12]	杨燕,晁丽鹏. 基于多维协同注意力的双支特征联合去雾网络[J]. 浙江大学学报(工学版), 2025, 59(6): 1119-1129.
[13]	鞠文博,董华军. 基于上下文信息融合与动态采样的主板缺陷检测方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1159-1168.
[14]	梁耕良,韩曙光. 基于改进RT-DETR的牛仔面料疵点检测算法[J]. 浙江大学学报(工学版), 2025, 59(6): 1169-1178.
[15]	蔡永青,韩成,权巍,陈兀迪. 基于注意力机制的视觉诱导晕动症评估模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1110-1118.

Viewed

Full text

Abstract

Cited

Shared

Discussed