Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2015, Vol. 16 Issue (7): 541-552    DOI: 10.1631/FITEE.1400405
    
BUEES: a bottom-up event extraction system
Xiao Ding, Bing Qin, Ting Liu
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin 150001, China
Download:   PDF(0KB)
Export: BibTeX | EndNote (RIS)      

Abstract  Traditional event extraction systems focus mainly on event type identification and event participant extraction based on pre-specified event type paradigms and manually annotated corpora. However, different domains have different event type paradigms. When transferring to a new domain, we have to build a new event type paradigm and annotate a new corpus from scratch. This kind of conventional event extraction system requires massive human effort, and hence prevents event extraction from being widely applicable. In this paper, we present BUEES, a bottom-up event extraction system, which extracts events from the web in a completely unsupervised way. The system automatically builds an event type paradigm in the input corpus, and then proceeds to extract a large number of instance patterns of these events. Subsequently, the system extracts event arguments according to these patterns. By conducting a series of experiments, we demonstrate the good performance of BUEES and compare it to a state-of-the-art Chinese event extraction system, i.e., a supervised event extraction system. Experimental results show that BUEES performs comparably (5% higher F-measure in event type identification and 3% higher F-measure in event argument extraction), but without any human effort.

Key wordsEvent extraction      Unsupervised learning      Bottom-up     
Received: 27 November 2014      Published: 06 July 2015
CLC:  TP391  
Cite this article:

Xiao Ding, Bing Qin, Ting Liu. BUEES: a bottom-up event extraction system. Front. Inform. Technol. Electron. Eng., 2015, 16(7): 541-552.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/FITEE.1400405     OR     http://www.zjujournals.com/xueshu/fitee/Y2015/V16/I7/541


自底向上事件抽取系统

目的:本文研究自底向上的事件抽取方法。在无需预先人工给定事件类型体系的基础上,实现事件类型体系的自动构建及事件类型识别和事件元素的抽取。
创新点:本文首次提出基于聚类的事件类型自动发现方法。和传统事件抽取技术相比,该方法无需预先定义事件类型,无需先验的领域知识。因此,该方法是对领域移植的一个尝试,尤其适用于知识和资源有限的领域。
方法:该方法依据谓语动词是对领域事件刻画的重要单元的特点,利用依存句法信息抽取领域事件词,利用?知网?(HowNet)对领域事件词进行聚类从而获取不同的事件类型(图2),随后进行事件元素的抽取。本文提出基于Bootstrapping的事件元素抽取框架,该框架核心有三部分:(1)模式获取:该模块负责将事件种子放在互联网上去检索,获得事件实例,并根据事件实例,按照一定的规则生成初始的事件模式(图3);(2)模式泛化:初始事件模式由于过于死板,导致遗漏掉很多事件的匹配,因此,本文设计模式泛化方法,将原有的事件模式按照一定规则,进行一定程度上的泛化,使其在保证准确率不变的情况下尽量提高召回率(算法3);(3)模式过滤:经泛化后的模式会在一定程度上引入噪声,因此,本文提出一套过滤规则,尽量减少泛化带来的噪声(表3)。
结论:提出自底向上的事件抽取系统。该系统在公开的ACE语料数据集上取得了优于当前最好基线方法的结果。同时在我们手工构造的音乐领域和金融领域数据集上也取得了优秀的实验结果。这表明该方法可以很好地进行领域自适应。

关键词: 事件抽取,  无监督学习,  自底向上 
[1] Muhammad Asif Zahoor Raja, Iftikhar Ahmad, Imtiaz Khan, Muhammed Ibrahem Syam, Abdul Majid Wazwaz. Neuro-heuristic computational intelligence for solving nonlinear pantograph systems[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 464-484.