Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (2): 245-253    DOI: 10.3785/j.issn.1008-973X.2022.02.004
    
Construction method of extraction dataset of Al-Si alloy entity relationship
Ying-li LIU1,2(),Rui-gang WU1,2,Chang-hui YAO1,2,Tao SHEN1,2,*()
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
2. Yunnan Key Laboratory of Computer Technologies Application, Kunming University of Science and Technology, Kunming 650500, China
Download: HTML     PDF(1157KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

At present, there is no public dataset suitable for the research work of material entity relationship extraction technology in the field of materials. Aiming at the above problem, the construction method of aluminum-silicon alloy entity relationship extraction dataset was proposed through the literature of high-silicon aluminum alloy spray deposition. The construction standards of the aluminum-silicon alloy entity relationship extraction dataset were formulated under the guidance of experts in the material field, and the collected data were marked with entities and relationships according to the construction standards. After the annotation was completed, the aluminum-silicon alloy entity relationship extraction dataset was generated through data preprocessing. Experiments were conducted through the entity-relationship joint extraction model to verify that the dataset can be applied to entity-relationship extraction tasks. Compared with the public dataset, the semantics and grammar of the sentence in the material dataset were more complicated, and there were more long sentences, which led to a slightly worse performance of the entity relationship joint extraction model on the material dataset. Therefore, a self-attention mechanism was added to the entity relationship joint extraction model, which increased the overall F1 value by about 5.8%. The method of constructing the dataset is universal, and the material dataset can be constructed by the construction method.



Key wordsdataset      construction standard      data annotation      entity relationship joint extraction model      self-attention mechanism     
Received: 12 July 2021      Published: 03 March 2022
CLC:  TP 391.1  
Corresponding Authors: Tao SHEN     E-mail: lyl2002@126.com;shentao@kust.edu.cn
Cite this article:

Ying-li LIU,Rui-gang WU,Chang-hui YAO,Tao SHEN. Construction method of extraction dataset of Al-Si alloy entity relationship. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 245-253.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.02.004     OR     https://www.zjujournals.com/eng/Y2022/V56/I2/245


铝硅合金实体关系抽取数据集的构建方法

针对材料领域没有适合材料实体关系抽取技术研究工作的公开数据集这一问题,通过研究高硅铝合金喷射沉积文献提出铝硅合金实体关系抽取数据集的构建方法. 在材料领域专家的指导下制定铝硅合金实体关系抽取数据集的构建标准,并根据构建标准对收集的数据进行实体标注和关系标注. 在标注完成后,通过数据预处理生成铝硅合金实体关系抽取数据集. 通过实体关系联合抽取模型进行实验,验证该数据集可以应用于实体关系抽取任务. 与公开数据集相比,材料数据集句子的语义和语法更为复杂,长句更多,导致实体关系联合抽取模型在材料数据集上的表现略差. 针对上述问题,在实体关系联合抽取模型上加入自注意力机制,使该模型整体的F1值提高了约5.8%. 该数据集的构建方法具有普适性,可以通过该构建方法构建材料数据集.


关键词: 数据集,  构建标准,  数据标注,  实体关系联合抽取模型,  自注意力机制 
数据集 Ne Nr
CoNLL-2004 4 5
ACE04 7 7
ADE 2 1
DERC 9 2
Tab.1 Number of entities versus number of relationships in exposed dataset
Fig.1 Overall framework diagram of extraction dataset of aluminum-silicon alloy entity relationship
性能 关键词 英文全称 缩写
拉伸 抗拉强度 tensile strength UTS/Rm/σb
延伸率 elongation EL/δ
硬度 维氏硬度 Vickers hardness HV
布氏硬度 Brinell hardness HB
洛氏硬度 Rockwell hardness HR
微观组织 透射 transmission electron microscope TEM
扫描 scanning electron microscopy SEM
光学显微镜 optical microscope OM
电子背散射衍射 electron back scattering diffraction EBSD
热膨胀系数 ? coefficient of thermal expansion CTE
Tab.2 Main properties of Al-Si alloys of interest
Fig.2 Structure diagram of entity type and relationship type
Fig.3 Brat manual annotation interface
实体 英文 标注格式
元素 Element Ele
含量 Content Con
合金 Alloy Alloy
实验 Experiment Exp
实验结果 Experiment_result Exp_r
参数名 Parameter_n Par_n
参数值 Parameter_v Par_v
测试名 Test_n Test_n
测试值 Test_v Test_v
测试图 Test_f Test_f
Phase Phase
Tab.3 Annotated format of entity in Brat
关系类型 关系 实体1 实体2 英文 标注
成分 含量-元素 含量 元素 Content-Element Con-Ele
成分 元素-合金 元素 合金 Element-Alloy Ele-Alloy
实验 合金-实验 合金 实验 Alloy-Experiment Alloy-Exp
实验 实验-实验结果 实验 实验结果 Experiment-Experiment_result Exp-Exp_r
实验 实验结果-参数名 实验结果 参数名 Experiment_result-Parameter_n Exp_r-Par_n
实验 实验-参数名 实验 参数名 Experiment-Parameter_n Exp-Par_n
测试 合金-测试名 合金 测试名 Alloy-Test_n Alloy-Test_n
测试 测试名-参数名 测试名 参数名 Test_n-Parameter_n Test_n-Par_n
测试 测试名-测试值 测试名 测试值 Test_n-Test_v Test_n-Test_v
测试 测试名-测试图 测试名 测试图 Test_n-Test_f Test_n-Test_f
测试 测试名-相 测试名 Test_n-Phase Test_n-Phase
测试 相-测试值 测试值 Phase-Test_v Phase-Test_v
参数 参数名-参数值 参数名 参数值 Parameter_n-Parameter_v Par_n-Par_v
Tab.4 Annotated format of relationships in Brat
Fig.4 Format of dataset read by relational extraction model
数据集 Ns Ne Nr
CoNLL-2004 1441 5347 2020
Al-Si合金关系抽取数据集 2246 2522 1510
Tab.5 Dataset data volume comparison
Fig.5 Sentence length comparison of dataset
实体类型 示例
含量 However, 3% Mn addition leads to a substantial improvement of the tensile strengths at elevated temperature.
元素 However, 3% Mn addition leads to a substantial improvement of the tensile strengths at elevated temperature.
合金 In this paper, Al-20Si-3Cu-1Mg alloy was prepared by spray deposition technique.
实验 In this paper, Al-20Si-3Cu-1Mg alloy was prepared by spray deposition technique.
实验结果 Cylindrical samples of 25 mm in diameter and 50 mm in length were machined out from each spray deposit obtained.
测试名 Fig.1(c) shows the scanning electron microscopy microstructure of the spray-deposited A2 alloy.
测试值 From table 2, with additions of 5% Fe to Al-20Si-3Cu-1Mg alloy, both the yield and ultimate tensile strengths was increased 55 and 64 MPa at room temperature.
测试图 Fig.1(c) shows the scanning electron microscopy microstructure of the spray-deposited A2 alloy.
The formation of b-Al5FeSi phase in this transformation is known to be very slow.
参数名 The testing temperatures were 298, 473 and 573 K.
参数值 The testing temperatures were 298, 473 and 573 K.
Tab.6 Example of entity type for Al-Si alloy relational extraction dataset
关系类型 示例
成分 1.(含量-元素)However, 3%(1)Mn(2) addition leads to a substantial improvement of the tensile strengths at elevated temperature.
2.(元素-合金)Effect of Fe(1) and Mn(1) additions on microstructure and mechanical properties of spray-deposition Al-20Si-3Cu-1Mg alloy(2).
实验 1.(合金-实验)In this paper, Al-20Si-3Cu-1Mg alloy(1) was prepared by spray deposition(2) technique.
2.(实验-实验结果)Cylindrical samples(2) of 25 mm in diameter and 50 mm in length were machined out from each spray deposit obtained(1).
3.(实验-参数名)The solidification process(1) during spray-deposition occurs in two stages: gas atomization (rapid cooling)(2)and droplet consolidation (relatively slow cooling)(2).
4.(实验结果-参数名)The starting carbon particle(2) were as large as 50 mm, while the TiC particles are less than 0.7 mm in reacted preforms(1).
测试 1.(合金-测试名)The UTSs(2) of the TiC/Al(1) composites were improved over that of the unreinforced Al matrix
2.(测试名-参数名)As shown in Table 2, the UTS(1) of the TiC/Al composites at room temperature(2) was improved over that of the unreinforced Al matrix.
3.(测试名-测试值)From table 2, with additions of 5% Fe to Al-20Si-3Cu-1Mg alloy, both the yield(1) and ultimate tensile strengths(1) was increased 55(2)and 64 MPa(2) at room temperature.
4.(测试名-测试图)Fig.1(c)(2) shows the scanning electron microscopy microstructure(1) of the spray-deposited A2 alloy.
5.(测试名-相)The high volume fraction of metastable d-Al4FeSi2 phase(2) in the spray-deposited microstructure(1) may be attributed to two primary reasons.
6.(相-测试值)The microstructure of the as-deposited alloy is composed of primary Si (1) with an average size of 12.5 μm(2) and secondary Al phase.
参数 1.(参数名-参数值)The testing temperatures(1) were 298(2), 473(2) and 573 K(2).
Tab.7 Example of relation type for Al-Si alloy relational extraction dataset
安装包 版本 安装包 版本
CUDA 10.2 numpy 1.19.4
CuDNN 7.6.5 sklearn 0.22
Python 3.6.12 prettytalbe 0.7.0
Tensorflow 1.15.0 pandas 0.24.2
gensim 3.4.0 ? ?
Tab.8 Model running environment and version number
数据集 NER任务 RE任务 总体
Pr Re F1 Pr Re F1 F1
本研究数据集 66.2 61.7 63.9 53.5 44.2 48.5 56.2
CoNLL-2004 67.7 68.7 68.2 54.1 47.8 54.8 61.5
Tab.9 Comparison of experimental results of dataset %
模型 NER任务 RE任务 总体 F1
Pr Re F1 Pr Re F1
att_Multi_head 71.3 64.4 67.7 65.2 49.5 56.3 62.0
Multi-head 66.2 61.7 63.9 53.5 44.2 48.5 56.2
对比 +5.1 +2.7 +3.8 +11.75 +5.3 +7.8 +5.8
Tab.10 Comparison of experimental results between Multi-head model and att_Multi_head model %
实体 TP FP FN Pr/% Re/% F1/%
Con(含量) 15 0 0 1.00 1.00 1.00
Ele(元素) 18 5 3 78.26 85.71 81.81
Alloy(合金) 37 8 10 82.22 78.72 80.43
Exp(实验) 36 13 13 73.46 73.46 73.46
Exp_r(实验结果) 0 0 1 0 0 0
Test_n(测试名) 52 23 29 69.33 64.19 66.66
Test_v(测试值) 6 7 12 46.15 33.33 38.70
Test_f(测试图) 18 9 9 66.66 66.66 66.66
Phase(相) 31 9 14 77.50 68.88 72.94
Par_n(参数名) 13 10 23 56.52 36.11 44.06
Par_v(参数值) 13 11 18 54.16 41.93 47.27
总计 239 95 132 71.34 64.44 67.71
Tab.11 Experimental results of various entities in NER task
关系 TP FP FN Pr/% Re/% F1/%
composition(成分) 22 7 9 75.86 70.96 73.33
experiment(实验) 29 12 17 70.73 63.04 66.66
test(测试) 50 30 63 62.50 44.24 51.81
parameter(参数) 8 9 23 54.16 41.93 47.27
总计 109 58 112 65.27 49.55 56.33
Tab.12 Experimental results of various entities in RE task
[1]   NOSENGO N, CEDER G Can artificial intelligence create the next wonder material?[J]. Nature, 2016, 533 (7601): 22- 25
doi: 10.1038/533022a
[2]   WANG Y, SEO B, WANG B, et al Fundamentals, materials, and machine learning of polymer electrolyte membrane fuel cell technology[J]. Energy and AI, 2020, 1: 100014
doi: 10.1016/j.egyai.2020.100014
[3]   JABLONKA K M, ONGARI D, MOOSAVI S M, et al Big-data science in porous materials: materials genomics and machine learning[J]. Chemical Reviews, 2020, 120 (16): 8066- 8129
doi: 10.1021/acs.chemrev.0c00004
[4]   GREEN M, CHOI C, HATTRICK-SIMPERS J, et al Fulfilling the promise of the materials genome initiative with high-throughput experimental methodologies[J]. Applied Physics Reviews, 2017, 4 (1): 011105
doi: 10.1063/1.4977487
[5]   KIM E, HUANG K, SAUNDERS A, et al Materials synthesis insights from scientific literature via text extraction and machine learning[J]. Chemistry of Materials, 2017, 29: 9436- 9444
doi: 10.1021/acs.chemmater.7b03500
[6]   RACCUGLIA P, ELBERT K C, ADLER P, et al Machine-learning-assisted materials discovery using failed experiments[J]. Nature, 2016, 533 (7601): 73- 76
doi: 10.1038/nature17439
[7]   TIAN C, CHEN G Y, YANG L, et al Microstructures and properties of Si-Al alloy for electronic packaging prepared by spray deposition technique[J]. Journal of Functional Materials and Devices, 2006, 12 (1): 54- 58
[8]   BEKOULIS G, DELEU J, DEMEESTER T, et al Joint entity recognition and relation extraction as a multi-head selection problem[J]. Expert Systems with Application, 2018, 114: 34- 45
doi: 10.1016/j.eswa.2018.07.032
[9]   SANG E T K J A Introduction to the CoNLL-2002 shared task: language-independent named entity recognition[J]. Computer Science, 2002, 20: 1- 4
[10]   CARRERAS X. Introduction to the CoNLL-2004 shared task: semantic role labeling[C]// Proceedings of the 8th Conference on Computational Natural Language Learning at HLT-NAACL 2004. Boston: [s.n.], 2004: 89-97.
[11]   BARRY P, HENRY S, YETISGEN M, et al. Jointly learning clinical entities and relations with contextual language models and explicit context[EB/OL]. [2021-07-01]. https://arxiv.org/abs/2102.11031.
[12]   DODDIOGTON G, MITCHELL A, PRIZYBOCKI M A, et al. The automatic content extraction (ACE) program: tasks, data, and evaluation [EB/OL]. [2021-07-01]. http://www.lrec-conf.org/proceedings/lrec2004/pdf/5.pdf.
[13]   KONONOVA O, HUO H, HE T, et al Author correction: text-mined dataset of inorganic materials synthesis recipes[J]. Scientific Data, 2019, 6 (1): 273
doi: 10.1038/s41597-019-0297-x
[14]   LI Z, YANG Z, XIANG Y, et al Exploiting sequence labeling framework to extract document-level relations from biomedical texts[J]. BMC Bioinformatics, 2020, 21 (1): 1- 14
doi: 10.1186/s12859-019-3325-0
[15]   GURULINGAPPA H, RAJPUT A, ROBERTS A, et al Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports[J]. Journal of Biomedical Informatics, 2012, 45 (5): 885- 892
doi: 10.1016/j.jbi.2012.04.008
[1] Xiao-chen JU,Xin-xin ZHAO,Sheng-sheng QIAN. Self-attention mechanism based bridge bolt detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 901-908.
[2] Bing XU,Xiao LIU,Zi-yang WANG,Fei-hu LIU,Jun LIANG. Fusion decision model for vehicle lane change with gradient boosting decision tree[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(6): 1171-1181.