Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
    
Cross-document personal name disambiguation merging sentential semantic analysis
ZHANG Han, LUO Sen-lin, ZOU Li-li, SHI Xiu-min
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
Download:   PDF(1216KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A multi-stage disambiguation algorithm was proposed based on the construction of text feature space. According to the characteristics of query terms often occurring as common terms, heuristic rule was applied to determine if the query term is personal name after the pre-processing of documents. Then named entity and occupation were extracted according to the feature templates. The sentential semantic model was used for sentential semantic analysis and sentential semantic features extraction. The word frequency was counted according to the bag-of-words model. Then the three layers of feature space were constructed. The rule-based classification and two-stage hierarchical clustering algorithm was used to realize the name disambiguation. The overlap coefficient was introduced to compute the similarity of the sentential semantic features. The experiments datasets built by CLP2012 Chinese Personal Name disambiguation showed that F achieved 88.79%, which proved that the proposed approach can improve the performance of cross-document personal name disambiguation.



Published: 01 April 2015
CLC:  TP 391  
Cite this article:

ZHANG Han, LUO Sen-lin, ZOU Li-li, SHI Xiu-min. Cross-document personal name disambiguation merging sentential semantic analysis. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(4): 717-723.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2015.04.016     OR     http://www.zjujournals.com/eng/Y2015/V49/I4/717


融合句义分析的跨文本人名消歧

在构造文本特征空间的基础上,提出融合句义分析的三阶段人名消歧方法.该方法针对查询词常作为普通词出现的特点,在文本预处理后采用启发式规则的后处理方法判断查询词是否指人名,根据特征模板提取局部名实体特征及职业.通过句义结构模型进行句义分析,提取句义特征,利用词袋模型统计词频,构成三层特征空间,使用基于规则的分类和两阶段层次聚类算法实现人名消歧.引入重叠系数计算句义特征相似度,在CLP2012中文人名消歧语料上进行实验,F达到88.79%,证明了将句义分析应用到跨文本人名消歧的效果良好.

[1] GUHA R, GARG A. Disambiguating people in search [C]∥ The 13th International World Wide Web Conference. New York: Association for Computing Machinery, 2004: 102-107.
[2] ARTILES J, GONZALO J, SEKINE S. The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task [C]∥ Proceedings of the 4th International Workshop on Semantic Evaluations. Prague: Association for Computational Linguistics, 2007: 64-69.
[3] BAGGA A, BALDWIN B. Entity-based cross-document conferencing using the vector space model [C]∥ Proceedings of the 17th International Conference on Computational Linguistics: Volume 1. Montreal, Ganada: Association for Computational Linguistics, 1998: 79-85.
[4] MANN G S, YAROWSKY D. Unsupervised personal name disambiguation [C]∥ Proceedings of the 17th Conference on Natural Language Learning at HLT-NAACL 2003: Volume 4. Sofia, Bulgaria: Association for Computational Linguistics, 2003: 33-40.
[5] MALIN B. Unsupervised name disambiguation via social network similarity [C]∥ Workshop on Link Analysis, Counterterrorism, and Security. Minneapolis: [s. n.], 2005, 1401: 93-102.
[6] WANG H, DING H. A multi-stage clustering framework for Chinese personal name disambiguation [C]∥ CIPS-SIGHAN Joint Conference on Chinese Language Processing.Tianjin: [s. n.], 2010: 88-94.
[7] XU R, XU J. Combine person name and person identity recognition and document clustering for Chinese person name disambiguation [C]∥ CIPS-SIGHAN Joint Conference on Chinese Language Processing. Tianjin: [s. n.], 2010: 95-100.
[8] 陈峰, 王厚峰. 基于社会网络的跨文本同名消歧[J]. 中文信息学报, 2011, 25(05): 76-82.
CHEN Feng, WANG Hou-feng. Social network based cross-document personal name disambiguation [J]. Journal of Chinese Information Processing. Tijanjin: [s.n.], 2011, 25(05): 76-82.
[9] WEI H, XU B, ZHAO T. Study on Chinese person name disambiguation based on multi-stage strategy [C]∥ 2011 8th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD). Chongqing: IEEE, 2011: 1177-1181.
[10] PENG Z, SUN L. SIR-NERD: a Chinese named entity recognition and disambiguation system using a two-stage method [C]∥ CIPS-SIGHAN Joint Conference on Chinese Language Processing. Wuhan: [s. n.], 2012: 115-120.
[11] 罗森林,韩磊,潘丽敏,等.汉语句义结构模型及其验证[J].北京理工大学学报:自然科学版, 2013, 33(2): 166-171.
LUO Sen-lin, HAN Lei, PAN Li-min, et al. Chinese sentential semantic mode and verification [J]. Beijing Institute of Technology: Natural Science, 2013, 33(2): 166-171.
[12] 冯扬. 汉语句义模型构建及若干关键技术研究[D]. 北京: 北京理工大学, 2010.
FENG Yang. Research on Chinese sentential semantic mode and some key problems [D]. Beijing: Beijing Institute of Technology, 2010.
[13] HAO Z, DEREK F. A template based hybrid model for Chinese personal name disambiguation [C]∥ CIPS-SIGHAN Joint Conference on Chinese Language Processing. Wuhan: [s. n.], 2012: 121-126.

[1] WANG Hua, HAN Tong-yang, ZHOU Ke. KeyGraph-based community detection algorithm for public security intelligence[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1173-1180.
[2] YOU Hai-hui, MA Zeng-yi, TANG Yi-jun, WANG Yue-lan, ZHENG Lin, YU Zhong, JI Cheng-jun. Soft measurement of heating value of burning municipal solid waste for circulating fluidized bed[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1163-1172.
[3] HE Xue-jun, WANG Jin, LU Guo-dong, LIU Zhen-yu, CHEN Li, JIN Jing. 3D head portrait sculpture by industrial robot based on triangular mesh slicing and collision detection[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1104-1110.
[4] BI Xiao-jun, WANG Jia-hui. Teaching-learning-based optimization algorithm with hybrid learning strategy[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 1024-1031.
[5] WANG Liang, YU Zhi-wen, GUO Bin. Moving trajectory prediction model based on double layer multi-granularity knowledge discovery[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 669-674.
[6] LIAO Miao, ZHAO Yu-qian, ZENG Ye-zhan, HUANG Zhong-chao, ZHANG Bing-kui, ZOU Bei-ji. Automatic segmentation for cell images based on support vector machine and ellipse fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 722-728.
[7] HUANG Zheng-yu, JIANG Xin-long, LIU Jun-fa, CHEN Yi-qiang, GU Yang. Fusion feature based semi-supervised manifold localization method[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 655-662.
[8] JIANG Xin-long, CHEN Yi-qiang, LIU Jun-fa, HU Li-sha, SHEN Jian-fei. Wearable system to support proximity awareness for people with autism[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 637-647.
[9] MU Jing-jing, ZHAO Xin-yue, HE Zai-xing, ZHANG Shu-you. Contour reconstruction of overlapped bubbles based on concave-convex transformation and circle fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 714-721.
[10] DAI Cai-yan, CHEN Ling, LI Bin, CHEN Bo-lun. Sampling-based link prediction in complex networks[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 554-561.
[11] LIU Lei, YANG Peng, LIU Zuo-jun. Locomotion-Mode recognition using multiple kernel relevance vector machine[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 562-571.
[12] GUO Meng-li, DA Fei-peng, DENG Xing, GAI Shao-yan. 3D face recognition based on keypoints and local feature[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 584-589.
[13] WANG Hai jun, GE Hong juan, ZHANG Sheng yan. Fast object tracking algorithm via kernel collaborative presentation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 399-407.
[14] ZHANG Ya nan, CHEN De yun, WANG Ying jie, LIU Yu peng. Incremental graph pattern matching based dynamic recommendation method for cold-start user[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 408-415.
[15] LIU Yu peng, QIAO Xiu ming, ZHAO Shi lei, MA Chun guang. Deep combination of large-scale features in statistical machine translation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 46-56.