Please wait a minute...
J4  2009, Vol. 43 Issue (6): 994-998    DOI: 10.3785/j.issn.1008-973X.2009.
计算机技术、自动化技术     
利用加权特征模型改进问句分类
黄鹏1,卜佳俊1,陈纯1,康志明1,陈伟1,胡洪涛2
(1浙江大学 计算机科学与技术学院,浙江 杭州 310027;2浙江丽水电业局,浙江 丽水 323000)
Improving question classification via weighted feature model
HUANG Peng1, BU Jia-jun1, CHEN Chun1, KANG Zhi-ming1, CHEN Wei1, HU Hong-tao2
(1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. Zhejiang Lishui Electric Power Bureau, Lishui 323000, China)
 全文: PDF(797 KB)   HTML
摘要:

为了减少目前大多数问句分类算法由于采用了布尔特征模型所导致的特征信息损失,提出了一个基于网络和聚类技术的加权特征模型来表达问句的特征空间.不同于以往采用的布尔特征模型将特征赋值为0或1以表示相应特征出现与否,新的加权特征模型将特征加权为一个位于区间0~10的一个实数,以区分不同的特征对于问句分类的贡献:权值越大,相应特征对于区分问句的类型做出的贡献越大.试验结果表明,该加权特征模型在问句分类领域优于之前被广泛使用的布尔特征模型.

Abstract:

A novel feature weighting model based on internet and clustering techniques was proposed to mitigate the issue of feature information loss stemming from the binary feature model employed in most current question classification algorithms, in which features are associated with weights in a range of 0~10, rather than simple binary values in the binary feature model. These weights can be used to quantify features contribution to question classification. Features with larger weights should be more contributable to the question classification. The experimental results show that the proposed feature weighting model outperforms the widely used binary feature model in the task of question classification.

出版日期: 2009-06-01
:  TP391  
通讯作者: 卜佳俊,男,教授.     E-mail: bjj@zju.edu.cn
作者简介: 黄鹏(1979-),男,江西宜春人,博士生,从事信息检索和自然语言处理方面的研究.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

黄鹏, 卜佳俊, 陈纯, 等. 利用加权特征模型改进问句分类[J]. J4, 2009, 43(6): 994-998.

HUANG Feng, BO Jia-Dun, CHEN Chun, et al. Improving question classification via weighted feature model. J4, 2009, 43(6): 994-998.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2009.        http://www.zjujournals.com/eng/CN/Y2009/V43/I6/994

[1]MOLDOVAN D, PASCA M, HARABAGIU S, et al. Performance issues and error analysis in an open-domain question answering system [C]∥ Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, 2001: 3340.
[2]ZHANG D, LEE W S. Question classification using support vector machines [C]∥ Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Canada: Association for Computational Linguistics, 2003: 2632.
[3]LI X, ROTH D. Learning question classifiers [C]∥ Proceedings of the 19th International Conference on Computational Linguistics. Taiwan: Association for Computational Linguistics, 2002: 556562.
[4]CHEUNG Z, PHAN K L, MAHIDADIA A, et al. Feature extraction for learning to classify questions [C]∥ Proceedings of Advances in Artificial Intelligence. Australia: Springer Berlin/ Heidelberg, 2004: 10691075.
[5]LI X , ROTH D. Learning question classifiers: the role of semantic information [J]. Natural Language Engineering, 2005, 12(3): 229249.
[6]COVER T M, THOMAS J A. Elements of information theory [M]. New York: Wiley-Interscience, 1991.
[7]SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval [J]. Information Processing and Management: an International Journal, 1988, 24(5): 513523.
[8]BUCKLEY C, SALTON G, ALLAN J, et al. Automatic query expansion using SMART: TREC 3 [C]∥ Overview of the Third Text REtrieval Conference (TREC-3). USA: National Institute of Standards and Technology (NIST), 1995: 6980.
[9]DUMAIS S T. Improving the retrieval information from external sources [J]. Behaviour Research Methods, Instruments and Computers, 1991, 23(2): 229236.
[10]MACQUEEN J. Some methods for classification and analysis of multivariate observations [C]∥ Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. California,USA: University of California Press, 1967:281297.
[11]BISHOP C M. Neural networks for pattern recognition [M]. Oxford: Oxford University Press, 1995.
[12]KINDERMANN J, LEOPOLD E, PAASS G. Multi-class classification with error correcting codes [R]. Sankt Augustin: German National Research Center for Information Technology, 2000.

[1] 许秋儿, 欧阳毅, 张三元, 张引. 基于均值骨架的网格变形复制[J]. J4, 2010, 44(4): 710-714.
[2] 单振宇, 杨莹春. 基于UBM降阶算法的高效说话人识别系统[J]. J4, 2009, 43(6): 978-982.
[3] 边柯柯, 王青, 李江雄, 等. 复杂自由曲面模型的局部协调设计技术[J]. J4, 2009, 43(6): 1118-1123.
[4] 蔡华辉, 王国瑾. 对数螺线段的多项式逼近与C-Bézier逼近[J]. J4, 2009, 43(6): 999-1004.
[5] 朱平, 汪国昭. 变次数B-样条曲线[J]. J4, 2009, 43(5): 789-795.
[6] 徐敬华, 张树有. 基于形态分布图与BP神经网络的三维模型检索方法[J]. J4, 2009, 43(5): 877-883.
[7] 楼斌, 沈海斌, 赵武锋, 等. 基于失真模型的结构相似度图像质量评价[J]. J4, 2009, 43(5): 864-868.