|
|
Large model knowledge-guided composite multi-attention method for document-level relation extraction |
Zhichao ZHU1( ),Jianqiang LI1,Hongzhi QI1,Qing ZHAO1,*( ),Qi GAO2,Siying LI2,Jiayi CAI2,Jinyan SHEN2 |
1. College of Computer Science, Beijing University of Technology, Beijing 100124, China 2. Beijing-Dublin International College, Beijing University of Technology, Beijing 100124, China |
|
|
Abstract A large language model knowledge-guided composite multi-attention (LKCM) method was proposed to address the shortcomings in current document-level relation extraction (DRE) methods, namely, the insufficient differentiation of internal feature importance in various semantic information and the limited, hard-to-expand scale of external domain knowledge. By integrating a composite multi-attention framework, the attention mechanism was utilized to meticulously extract features at the word, sentence, and document levels to effectively distinguish the varying importance of internal features across different semantic information. A large language model was fine-tuned as a dynamic domain knowledge base component and its extensive commonsense knowledge and reasoning capabilities were leveraged to continuously provide guidance for the model. This design effectively mitigates the issues of limited knowledge scale and difficult real-time expansion. Experimental results on a real-world medical relation dataset showed that the average F1 score of the LKCM was 1.54 percentage points higher than that of the best baseline. The comprehensive analysis demonstrated that this method not only enhanced the capture of long-distance, cross-sentence relations but also improved the identification of key features. The LKCM method exhibits strong performance and broad applicability.
|
Received: 25 September 2024
Published: 25 August 2025
|
|
Fund: 国家科学基金联合基金资助项目(U20A2018);北京市卫生健康委员会高级公共卫生技术人才建设项目(领军人才03-10). |
Corresponding Authors:
Qing ZHAO
E-mail: zhuzc@emails.bjut.edu.cn;zhaoqing@bjut.edu.cn
|
大模型知识引导的复合多注意力文档级关系抽取方法
针对现有文档级关系抽取(DRE)方法对各类语义信息内部特征的重要性区分不足以及外部领域知识规模受限、实时扩展困难的问题,提出大语言模型知识引导的复合多注意力(LKCM)方法. 通过集成复合多注意力框架,利用注意力机制对词、句和文档级特征进行细致提取,有效区分不同语义信息内部特征的重要性;将大语言模型微调为动态领域知识库组件,借助其广泛的常识性知识和强大的推理能力,持续为模型提供知识指导,有效缓解知识规模有限和难以实时扩展的问题. 在真实医学关系数据集上的实验结果表明,LKCM在F1指标上的平均值超出最佳基线方法1.54个百分点. 该方法显著提高了长距离跨句关系的捕捉能力,增强了对关键特征的辨识效果,具备较好的性能和推广价值.
关键词:
文档级关系抽取,
领域知识,
注意力,
大语言模型,
常识推理
|
|
[1] |
ZHAO Q, XU D, LI J, et al Knowledge guided distance supervision for biomedical relation extraction in Chinese electronic medical records[J]. Expert Systems with Applications, 2022, 204: 117606
doi: 10.1016/j.eswa.2022.117606
|
|
|
[2] |
HEIST N, PAULHEIM H. Language-agnostic relation extraction from wikipedia abstracts [C]// International Semantic Web Conference. Vienna: Springer, 2017: 383–399.
|
|
|
[3] |
ZENG D, LIU K, LAI S, et al. Relation classification via convolutional deep neural network [C]// International Conference on Computational Linguistics. Dublin: ACL, 2014: 2335–2344.
|
|
|
[4] |
ZHANG Y, QI P, MANNING C D. Graph convolution over pruned dependency trees improves relation extraction [C]// Conference on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018: 2205–2215.
|
|
|
[5] |
YAO Y, YE D, LI P, et al. DocRED: a large-scale document-level relation extraction dataset [C]// Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019: 764–777.
|
|
|
[6] |
KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [EB/OL]. (2016-09-09). https://arxiv.org/abs/1609.02907.
|
|
|
[7] |
WANG D, HU W, CAO E, et al. Global-to-local neural networks for document-level relation extraction [C]// Conference on Empirical Methods in Natural Language Processing. [S.l.]: ACL, 2020: 3711–3721.
|
|
|
[8] |
LIU H, KANG Z, ZHANG L, et al. Document-level relation extraction with cross-sentence reasoning graph [C]// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka: Springer, 2023: 316–328.
|
|
|
[9] |
ZHOU H, XU Y, YAO W, et al. Global context-enhanced graph convolutional networks for document-level relation extraction [C]// International Conference on Computational Linguistics. Barcelona: ICCL, 2020: 5259–5270.
|
|
|
[10] |
VRANDEČIĆ D, KRÖTZSCH M Wikidata: a free collaborative knowledgebase[J]. Communications of the ACM, 2014, 57 (10): 78- 85
doi: 10.1145/2629489
|
|
|
[11] |
AUER S, BIZER C, KOBILAROV G, et al. DBpedia: a nucleus for a web of open data [C]// International Semantic Web Conference. Busan: Springer, 2007: 722–735.
|
|
|
[12] |
BASTOS A, NADGERI A, SINGH K, et al. RECON: relation extraction using knowledge graph context in a graph neural network [C]// International World Wide Web Conference. Ljubljana: ACM, 2021: 1673–1685.
|
|
|
[13] |
FERNÀNDEZ-CAÑELLAS D, MARCO RIMMEK J, ESPADALER J, et al. Enhancing online knowledge graph population with semantic knowledge [C]// International Semantic Web Conference. Athens: Springer, 2020: 183–200.
|
|
|
[14] |
PAN J, ZHANG M, SINGH K, et al. Entity enabled relation linking [C]// International Semantic Web Conference. Auckland: Springer, 2019: 523–538.
|
|
|
[15] |
WANG X, WANG Z, SUN W, et al. Enhancing document-level relation extraction by entity knowledge injection [C]// International Semantic Web Conference. [S.l.]: Springer, 2022: 39–56.
|
|
|
[16] |
WANG H, QIN K, LU G, et al Document-level relation extraction using evidence reasoning on RST-GRAPH[J]. Knowledge-Based Systems, 2021, 228: 107274
doi: 10.1016/j.knosys.2021.107274
|
|
|
[17] |
SOUSA D, COUTO F M Biomedical relation extraction with knowledge graph-based recommendations[J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26 (8): 4207- 4217
doi: 10.1109/JBHI.2022.3173558
|
|
|
[18] |
CHEN J, HU B, PENG W, et al Biomedical relation extraction via knowledge-enhanced reading comprehension[J]. BMC Bioinformatics, 2022, 23 (1): 20
doi: 10.1186/s12859-021-04534-5
|
|
|
[19] |
ZHANG B, LI L, SONG D, et al Biomedical event causal relation extraction based on a knowledge-guided hierarchical graph network[J]. Soft Computing, 2023, 27 (22): 17369- 17386
doi: 10.1007/s00500-023-08882-7
|
|
|
[20] |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding [EB/OL]. (2018-10-11). https://arxiv.org/abs/1810.04805v2.
|
|
|
[21] |
WANG H, FOCKE C, SYLVESTER R, et al. Fine-tune BERT for DocRED with two-step process [EB/OL]. (2019-09-26). https://arxiv.org/abs/1909.11898v1.
|
|
|
[22] |
ZHOU W, HUANG K, MA T, et al. Document-level relation extraction with adaptive thresholding and localized context pooling [C]// AAAI Conference on Artificial Intelligence. [S.l.]: AAAI Press, 2021: 14612–14620.
|
|
|
[23] |
XU B, WANG Q, LYU Y, et al. Entity structure within and throughout: modeling mention dependencies for document-level relation extraction [C]// AAAI Conference on Artificial Intelligence. [S.l.]: AAAI Press, 2021: 14149–14157.
|
|
|
[24] |
QUIRK C, POON H. Distant supervision for relation extraction beyond the sentence boundary [C]// Conference of the European Chapter of the Association for Computational Linguistics. Valencia: ACL, 2017: 1171–1182.
|
|
|
[25] |
PENG N, POON H, QUIRK C, et al Cross-sentence N-ary relation extraction with graph LSTMs[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 101- 115
doi: 10.1162/tacl_a_00049
|
|
|
[26] |
VERGA P, STRUBELL E, MCCALLUM A. Simultaneously self-attending to all mentions for full-abstract biological relation extraction [C]// Conference of the North American Chapter of the Association for Computational Linguistics. New Orleans: ACL, 2018: 872–884.
|
|
|
[27] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// International Conference on Neural Information Processing Systems. Long Beach: NeurIPS Foundation, 2017: 6000–6010.
|
|
|
[28] |
NAN G, GUO Z, SEKULIC I, et al. Reasoning with latent structure refinement for document-level relation extraction [C]// Annual Meeting of the Association for Computational Linguistics. [S.l.]: ACL, 2020: 1546–1557.
|
|
|
[29] |
ZENG S, XU R, CHANG B, et al. Double graph based reasoning for document-level relation extraction [C]// Conference on Empirical Methods in Natural Language Processing. [S.l.]: ACL, 2020: 1630–1640.
|
|
|
[30] |
LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach [EB/OL]. (2019-07-26). https://arxiv.org/abs/1907.11692.
|
|
|
[31] |
ZENG A, XU B, WANG B, et al. ChatGLM: a family of large language models from GLM-130B to GLM-4 all tools. (2024-07-30). https://arxiv.org/abs/2406.12793.
|
|
|
[32] |
OUYANG L, WU J, XU J, et al. Training language models to follow instructions with human feedback [C]// International Conference on Neural Information Processing Systems. New Orleans: NeurIPS Foundation, 2022: 27730–27744.
|
|
|
[33] |
DUBEY A, JAUHR A, PANDEY A, et al. The Llama 3 herd of models [EB/OL]. (2024-07-31). https://arxiv.org/abs/2407.21783.
|
|
|
[34] |
HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models [EB/OL]. (2021-10-16). https://arxiv.org/abs/2106.09685.
|
|
|
[35] |
VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks [EB/OL]. (2017-10-30). https://arxiv.org/abs/1710.10903.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|