Large model knowledge-guided composite multi-attention method for document-level relation extraction

doi:10.3785/j.issn.1008-973X.2025.09.003

Journal of ZheJiang University (Engineering Science)

2025, Vol. 59

Issue (9): 1793-1802 DOI: 10.3785/j.issn.1008-973X.2025.09.003

Large model knowledge-guided composite multi-attention method for document-level relation extraction

Zhichao ZHU1(

),Jianqiang LI1,Hongzhi QI1,Qing ZHAO1,*(

),Qi GAO2,Siying LI2,Jiayi CAI2,Jinyan SHEN2

1. College of Computer Science, Beijing University of Technology, Beijing 100124, China
2. Beijing-Dublin International College, Beijing University of Technology, Beijing 100124, China

Download:

HTML

PDF(1235KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A large language model knowledge-guided composite multi-attention (LKCM) method was proposed to address the shortcomings in current document-level relation extraction (DRE) methods, namely, the insufficient differentiation of internal feature importance in various semantic information and the limited, hard-to-expand scale of external domain knowledge. By integrating a composite multi-attention framework, the attention mechanism was utilized to meticulously extract features at the word, sentence, and document levels to effectively distinguish the varying importance of internal features across different semantic information. A large language model was fine-tuned as a dynamic domain knowledge base component and its extensive commonsense knowledge and reasoning capabilities were leveraged to continuously provide guidance for the model. This design effectively mitigates the issues of limited knowledge scale and difficult real-time expansion. Experimental results on a real-world medical relation dataset showed that the average F1 score of the LKCM was 1.54 percentage points higher than that of the best baseline. The comprehensive analysis demonstrated that this method not only enhanced the capture of long-distance, cross-sentence relations but also improved the identification of key features. The LKCM method exhibits strong performance and broad applicability.

Key words： document-level relation extraction domain knowledge attention large language model common sense reasoning

Received: 25 September 2024 Published: 25 August 2025

CLC:

TP 393

Fund: 国家科学基金联合基金资助项目（U20A2018）；北京市卫生健康委员会高级公共卫生技术人才建设项目（领军人才03-10）.

Corresponding Authors: Qing ZHAO E-mail: zhuzc@emails.bjut.edu.cn;zhaoqing@bjut.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Zhichao ZHU
	Jianqiang LI
	Hongzhi QI
	Qing ZHAO
	Qi GAO
	Siying LI
	Jiayi CAI
	Jinyan SHEN

Cite this article:

Zhichao ZHU,Jianqiang LI,Hongzhi QI,Qing ZHAO,Qi GAO,Siying LI,Jiayi CAI,Jinyan SHEN. Large model knowledge-guided composite multi-attention method for document-level relation extraction. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1793-1802.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.09.003 OR https://www.zjujournals.com/eng/Y2025/V59/I9/1793

大模型知识引导的复合多注意力文档级关系抽取方法

针对现有文档级关系抽取（DRE）方法对各类语义信息内部特征的重要性区分不足以及外部领域知识规模受限、实时扩展困难的问题，提出大语言模型知识引导的复合多注意力（LKCM）方法. 通过集成复合多注意力框架，利用注意力机制对词、句和文档级特征进行细致提取，有效区分不同语义信息内部特征的重要性；将大语言模型微调为动态领域知识库组件，借助其广泛的常识性知识和强大的推理能力，持续为模型提供知识指导，有效缓解知识规模有限和难以实时扩展的问题. 在真实医学关系数据集上的实验结果表明，LKCM在F1指标上的平均值超出最佳基线方法1.54个百分点. 该方法显著提高了长距离跨句关系的捕捉能力，增强了对关键特征的辨识效果，具备较好的性能和推广价值.

关键词： 文档级关系抽取, 领域知识, 注意力, 大语言模型, 常识推理

Fig.1 Framework of large language model knowledge-guided composite multi-attention method

Tab.1 Performance comparison results with advanced models

Tab.2 Performance comparison results of models with different components

Tab.3 Visualized attention weights of LKCM and LKCM-1

Tab.4 Statistical significance analysis results between LKCM and baseline models


[1]	ZHAO Q, XU D, LI J, et al Knowledge guided distance supervision for biomedical relation extraction in Chinese electronic medical records[J]. Expert Systems with Applications, 2022, 204: 117606 doi: 10.1016/j.eswa.2022.117606

[2]	HEIST N, PAULHEIM H. Language-agnostic relation extraction from wikipedia abstracts [C]// International Semantic Web Conference. Vienna: Springer, 2017: 383–399.

[3]	ZENG D, LIU K, LAI S, et al. Relation classification via convolutional deep neural network [C]// International Conference on Computational Linguistics. Dublin: ACL, 2014: 2335–2344.

[4]	ZHANG Y, QI P, MANNING C D. Graph convolution over pruned dependency trees improves relation extraction [C]// Conference on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018: 2205–2215.

[5]	YAO Y, YE D, LI P, et al. DocRED: a large-scale document-level relation extraction dataset [C]// Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019: 764–777.

[6]	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [EB/OL]. (2016-09-09). https://arxiv.org/abs/1609.02907.

[7]	WANG D, HU W, CAO E, et al. Global-to-local neural networks for document-level relation extraction [C]// Conference on Empirical Methods in Natural Language Processing. [S.l.]: ACL, 2020: 3711–3721.

[8]	LIU H, KANG Z, ZHANG L, et al. Document-level relation extraction with cross-sentence reasoning graph [C]// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka: Springer, 2023: 316–328.

[9]	ZHOU H, XU Y, YAO W, et al. Global context-enhanced graph convolutional networks for document-level relation extraction [C]// International Conference on Computational Linguistics. Barcelona: ICCL, 2020: 5259–5270.

[10]	VRANDEČIĆ D, KRÖTZSCH M Wikidata: a free collaborative knowledgebase[J]. Communications of the ACM, 2014, 57 (10): 78- 85 doi: 10.1145/2629489

[11]	AUER S, BIZER C, KOBILAROV G, et al. DBpedia: a nucleus for a web of open data [C]// International Semantic Web Conference. Busan: Springer, 2007: 722–735.

[12]	BASTOS A, NADGERI A, SINGH K, et al. RECON: relation extraction using knowledge graph context in a graph neural network [C]// International World Wide Web Conference. Ljubljana: ACM, 2021: 1673–1685.

[13]	FERNÀNDEZ-CAÑELLAS D, MARCO RIMMEK J, ESPADALER J, et al. Enhancing online knowledge graph population with semantic knowledge [C]// International Semantic Web Conference. Athens: Springer, 2020: 183–200.

[14]	PAN J, ZHANG M, SINGH K, et al. Entity enabled relation linking [C]// International Semantic Web Conference. Auckland: Springer, 2019: 523–538.

[15]	WANG X, WANG Z, SUN W, et al. Enhancing document-level relation extraction by entity knowledge injection [C]// International Semantic Web Conference. [S.l.]: Springer, 2022: 39–56.

[16]	WANG H, QIN K, LU G, et al Document-level relation extraction using evidence reasoning on RST-GRAPH[J]. Knowledge-Based Systems, 2021, 228: 107274 doi: 10.1016/j.knosys.2021.107274

[17]	SOUSA D, COUTO F M Biomedical relation extraction with knowledge graph-based recommendations[J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26 (8): 4207- 4217 doi: 10.1109/JBHI.2022.3173558

[18]	CHEN J, HU B, PENG W, et al Biomedical relation extraction via knowledge-enhanced reading comprehension[J]. BMC Bioinformatics, 2022, 23 (1): 20 doi: 10.1186/s12859-021-04534-5

[19]	ZHANG B, LI L, SONG D, et al Biomedical event causal relation extraction based on a knowledge-guided hierarchical graph network[J]. Soft Computing, 2023, 27 (22): 17369- 17386 doi: 10.1007/s00500-023-08882-7

[20]	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding [EB/OL]. (2018-10-11). https://arxiv.org/abs/1810.04805v2.

[21]	WANG H, FOCKE C, SYLVESTER R, et al. Fine-tune BERT for DocRED with two-step process [EB/OL]. (2019-09-26). https://arxiv.org/abs/1909.11898v1.

[22]	ZHOU W, HUANG K, MA T, et al. Document-level relation extraction with adaptive thresholding and localized context pooling [C]// AAAI Conference on Artificial Intelligence. [S.l.]: AAAI Press, 2021: 14612–14620.

[23]	XU B, WANG Q, LYU Y, et al. Entity structure within and throughout: modeling mention dependencies for document-level relation extraction [C]// AAAI Conference on Artificial Intelligence. [S.l.]: AAAI Press, 2021: 14149–14157.

[24]	QUIRK C, POON H. Distant supervision for relation extraction beyond the sentence boundary [C]// Conference of the European Chapter of the Association for Computational Linguistics. Valencia: ACL, 2017: 1171–1182.

[25]	PENG N, POON H, QUIRK C, et al Cross-sentence N-ary relation extraction with graph LSTMs[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 101- 115 doi: 10.1162/tacl_a_00049

[26]	VERGA P, STRUBELL E, MCCALLUM A. Simultaneously self-attending to all mentions for full-abstract biological relation extraction [C]// Conference of the North American Chapter of the Association for Computational Linguistics. New Orleans: ACL, 2018: 872–884.

[27]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// International Conference on Neural Information Processing Systems. Long Beach: NeurIPS Foundation, 2017: 6000–6010.

[28]	NAN G, GUO Z, SEKULIC I, et al. Reasoning with latent structure refinement for document-level relation extraction [C]// Annual Meeting of the Association for Computational Linguistics. [S.l.]: ACL, 2020: 1546–1557.

[29]	ZENG S, XU R, CHANG B, et al. Double graph based reasoning for document-level relation extraction [C]// Conference on Empirical Methods in Natural Language Processing. [S.l.]: ACL, 2020: 1630–1640.

[30]	LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach [EB/OL]. (2019-07-26). https://arxiv.org/abs/1907.11692.

[31]	ZENG A, XU B, WANG B, et al. ChatGLM: a family of large language models from GLM-130B to GLM-4 all tools. (2024-07-30). https://arxiv.org/abs/2406.12793.

[32]	OUYANG L, WU J, XU J, et al. Training language models to follow instructions with human feedback [C]// International Conference on Neural Information Processing Systems. New Orleans: NeurIPS Foundation, 2022: 27730–27744.

[33]	DUBEY A, JAUHR A, PANDEY A, et al. The Llama 3 herd of models [EB/OL]. (2024-07-31). https://arxiv.org/abs/2407.21783.

[34]	HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models [EB/OL]. (2021-10-16). https://arxiv.org/abs/2106.09685.

[35]	VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks [EB/OL]. (2017-10-30). https://arxiv.org/abs/1710.10903.

[1]	Yishan LIN,Jing ZUO,Shuhua LU. Multimodal sentiment analysis based on multi-head self-attention mechanism and MLP-Interactor[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1653-1661.

[2]	Jiarui FU,Zhaofei LI,Hao ZHOU,Wei HUANG. Camouflaged object detection based on Convnextv2 and texture-edge guidance[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1718-1726.

[3]	Panrong WANG,Hairong JIA,Shufei DUAN. Two-stage audio-visual speech enhancement algorithm based on convolution and gated attention[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1662-1670.

[4]	Yahong ZHAI,Yaling CHEN,Longyan XU,Yu GONG. Improved YOLOv8s lightweight small target detection algorithm of UAV aerial image[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1708-1717.

[5]	Xuejun ZHANG,Shubin LIANG,Wanrong BAI,Fenghe ZHANG,Haiyan HUANG,Meifeng GUO,Zhuo CHEN. Source code vulnerability detection method based on heterogeneous graph representation[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1644-1652.

[6]	Rongtai YANG,Yubin SHAO,Qingzhi DU. Structure-aware model for few-shot knowledge completion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1394-1402.

[7]	Jie LIU,You WU,Jiahe TIAN,Ke HAN. Based on improved Transformer for super-resolution reconstruction of lung CT images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1434-1442.

[8]	Jingyao HE,Pengfei LI,Chengzhi WANG,Zhenming LV,Ping MU. Dynamic 3D reconstruction method using binocular vision and improved YOLOv8[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1443-1450.

[9]	Yuhao YANG,Yongcun GUO,Deyong LI,Shuang WANG. Segmentation and localization method of coal and gangue identification based on visual information[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1421-1433.

[10]	Shengju WANG,Zan ZHANG. Missing value imputation algorithm based on accelerated diffusion model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1471-1480.

[11]	Dongping ZHANG,Dawei WANG,Shuji HE,Siliang TANG,Zhiyong LIU,Zhongqiu LIU. Remaining useful life prediction of aircraft engines based on cross-dimensional feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1504-1513.

[12]	Yan YANG,Lipeng CHAO. A two-branch feature joint dehazing network based on multidimensional collaborative attention[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1119-1129.

[13]	Wenbo JU,Huajun DONG. Motherboard defect detection method based on context information fusion and dynamic sampling[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1159-1168.

[14]	Gengliang LIANG,Shuguang HAN. Denim fabric defect detection algorithm based on improved RT-DETR[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1169-1178.

[15]	Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.

Viewed

Full text

Abstract

Cited

Shared

Discussed