Please wait a minute...
浙江大学学报(工学版)
通信工程、自动化技术     
融合句义分析的跨文本人名消歧
张晗, 罗森林, 邹丽丽, 石秀民
北京理工大学 信息与电子学院, 北京 100081
Cross-document personal name disambiguation merging sentential semantic analysis
ZHANG Han, LUO Sen-lin, ZOU Li-li, SHI Xiu-min
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
 全文: PDF(1216 KB)   HTML
摘要:

在构造文本特征空间的基础上,提出融合句义分析的三阶段人名消歧方法.该方法针对查询词常作为普通词出现的特点,在文本预处理后采用启发式规则的后处理方法判断查询词是否指人名,根据特征模板提取局部名实体特征及职业.通过句义结构模型进行句义分析,提取句义特征,利用词袋模型统计词频,构成三层特征空间,使用基于规则的分类和两阶段层次聚类算法实现人名消歧.引入重叠系数计算句义特征相似度,在CLP2012中文人名消歧语料上进行实验,F达到88.79%,证明了将句义分析应用到跨文本人名消歧的效果良好.

Abstract:

A multi-stage disambiguation algorithm was proposed based on the construction of text feature space. According to the characteristics of query terms often occurring as common terms, heuristic rule was applied to determine if the query term is personal name after the pre-processing of documents. Then named entity and occupation were extracted according to the feature templates. The sentential semantic model was used for sentential semantic analysis and sentential semantic features extraction. The word frequency was counted according to the bag-of-words model. Then the three layers of feature space were constructed. The rule-based classification and two-stage hierarchical clustering algorithm was used to realize the name disambiguation. The overlap coefficient was introduced to compute the similarity of the sentential semantic features. The experiments datasets built by CLP2012 Chinese Personal Name disambiguation showed that F achieved 88.79%, which proved that the proposed approach can improve the performance of cross-document personal name disambiguation.

出版日期: 2015-04-01
:  TP 391  
基金资助:

国家“242”信息安全计划资助项目(2005C48);北京理工大学科技创新计划重大项目培育专项资助项目(2011CX01015)

通讯作者: 罗森林,男,教授     E-mail: luosenlin@bit.edu.cn
作者简介: 张晗(1989—),男,硕士生,从事中文信息处理的研究. E-mail: han445812769@126.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

张晗, 罗森林, 邹丽丽, 石秀民. 融合句义分析的跨文本人名消歧[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2015.04.016.

ZHANG Han, LUO Sen-lin, ZOU Li-li, SHI Xiu-min. Cross-document personal name disambiguation merging sentential semantic analysis. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2015.04.016.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2015.04.016        http://www.zjujournals.com/eng/CN/Y2015/V49/I4/717

[1] GUHA R, GARG A. Disambiguating people in search [C]∥ The 13th International World Wide Web Conference. New York: Association for Computing Machinery, 2004: 102-107.
[2] ARTILES J, GONZALO J, SEKINE S. The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task [C]∥ Proceedings of the 4th International Workshop on Semantic Evaluations. Prague: Association for Computational Linguistics, 2007: 64-69.
[3] BAGGA A, BALDWIN B. Entity-based cross-document conferencing using the vector space model [C]∥ Proceedings of the 17th International Conference on Computational Linguistics: Volume 1. Montreal, Ganada: Association for Computational Linguistics, 1998: 79-85.
[4] MANN G S, YAROWSKY D. Unsupervised personal name disambiguation [C]∥ Proceedings of the 17th Conference on Natural Language Learning at HLT-NAACL 2003: Volume 4. Sofia, Bulgaria: Association for Computational Linguistics, 2003: 33-40.
[5] MALIN B. Unsupervised name disambiguation via social network similarity [C]∥ Workshop on Link Analysis, Counterterrorism, and Security. Minneapolis: [s. n.], 2005, 1401: 93-102.
[6] WANG H, DING H. A multi-stage clustering framework for Chinese personal name disambiguation [C]∥ CIPS-SIGHAN Joint Conference on Chinese Language Processing.Tianjin: [s. n.], 2010: 88-94.
[7] XU R, XU J. Combine person name and person identity recognition and document clustering for Chinese person name disambiguation [C]∥ CIPS-SIGHAN Joint Conference on Chinese Language Processing. Tianjin: [s. n.], 2010: 95-100.
[8] 陈峰, 王厚峰. 基于社会网络的跨文本同名消歧[J]. 中文信息学报, 2011, 25(05): 76-82.
CHEN Feng, WANG Hou-feng. Social network based cross-document personal name disambiguation [J]. Journal of Chinese Information Processing. Tijanjin: [s.n.], 2011, 25(05): 76-82.
[9] WEI H, XU B, ZHAO T. Study on Chinese person name disambiguation based on multi-stage strategy [C]∥ 2011 8th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD). Chongqing: IEEE, 2011: 1177-1181.
[10] PENG Z, SUN L. SIR-NERD: a Chinese named entity recognition and disambiguation system using a two-stage method [C]∥ CIPS-SIGHAN Joint Conference on Chinese Language Processing. Wuhan: [s. n.], 2012: 115-120.
[11] 罗森林,韩磊,潘丽敏,等.汉语句义结构模型及其验证[J].北京理工大学学报:自然科学版, 2013, 33(2): 166-171.
LUO Sen-lin, HAN Lei, PAN Li-min, et al. Chinese sentential semantic mode and verification [J]. Beijing Institute of Technology: Natural Science, 2013, 33(2): 166-171.
[12] 冯扬. 汉语句义模型构建及若干关键技术研究[D]. 北京: 北京理工大学, 2010.
FENG Yang. Research on Chinese sentential semantic mode and some key problems [D]. Beijing: Beijing Institute of Technology, 2010.
[13] HAO Z, DEREK F. A template based hybrid model for Chinese personal name disambiguation [C]∥ CIPS-SIGHAN Joint Conference on Chinese Language Processing. Wuhan: [s. n.], 2012: 121-126.

[1] 何雪军, 王进, 陆国栋, 刘振宇, 陈立, 金晶. 基于三角网切片及碰撞检测的工业机器人三维头像雕刻[J]. 浙江大学学报(工学版), 2017, 51(6): 1104-1110.
[2] 王桦, 韩同阳, 周可. 公安情报中基于关键图谱的群体发现算法[J]. 浙江大学学报(工学版), 2017, 51(6): 1173-1180.
[3] 尤海辉, 马增益, 唐义军, 王月兰, 郑林, 俞钟, 吉澄军. 循环流化床入炉垃圾热值软测量[J]. 浙江大学学报(工学版), 2017, 51(6): 1163-1172.
[4] 毕晓君, 王佳荟. 基于混合学习策略的教与学优化算法[J]. 浙江大学学报(工学版), 2017, 51(5): 1024-1031.
[5] 蒋鑫龙, 陈益强, 刘军发, 忽丽莎, 沈建飞. 面向自闭症患者社交距离认知的可穿戴系统[J]. 浙江大学学报(工学版), 2017, 51(4): 637-647.
[6] 王亮, 於志文, 郭斌. 基于双层多粒度知识发现的移动轨迹预测模型[J]. 浙江大学学报(工学版), 2017, 51(4): 669-674.
[7] 廖苗, 赵于前, 曾业战, 黄忠朝, 张丙奎, 邹北骥. 基于支持向量机和椭圆拟合的细胞图像自动分割[J]. 浙江大学学报(工学版), 2017, 51(4): 722-728.
[8] 穆晶晶, 赵昕玥, 何再兴, 张树有. 基于凹凸变换与圆周拟合的重叠气泡轮廓重构[J]. 浙江大学学报(工学版), 2017, 51(4): 714-721.
[9] 黄正宇, 蒋鑫龙, 刘军发, 陈益强, 谷洋. 基于融合特征的半监督流形约束定位方法[J]. 浙江大学学报(工学版), 2017, 51(4): 655-662.
[10] 刘磊, 杨鹏, 刘作军. 采用多核相关向量机的人体步态识别[J]. 浙江大学学报(工学版), 2017, 51(3): 562-571.
[11] 郭梦丽, 达飞鹏, 邓星, 盖绍彦. 基于关键点和局部特征的三维人脸识别[J]. 浙江大学学报(工学版), 2017, 51(3): 584-589.
[12] 戴彩艳, 陈崚, 李斌, 陈伯伦. 复杂网络中的抽样链接预测[J]. 浙江大学学报(工学版), 2017, 51(3): 554-561.
[13] 王海军, 葛红娟, 张圣燕. 基于核协同表示的快速目标跟踪算法[J]. 浙江大学学报(工学版), 2017, 51(2): 399-407.
[14] 张亚楠, 陈德运, 王莹洁, 刘宇鹏. 基于增量图形模式匹配的动态冷启动推荐方法[J]. 浙江大学学报(工学版), 2017, 51(2): 408-415.
[15] 刘宇鹏, 乔秀明, 赵石磊, 马春光. 统计机器翻译中大规模特征的深度融合[J]. 浙江大学学报(工学版), 2017, 51(1): 46-56.