|
|
Improving question classification via weighted feature model |
HUANG Peng1, BU Jia-jun1, CHEN Chun1, KANG Zhi-ming1, CHEN Wei1, HU Hong-tao2 |
(1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. Zhejiang Lishui Electric Power Bureau, Lishui 323000, China) |
|
|
Abstract A novel feature weighting model based on internet and clustering techniques was proposed to mitigate the issue of feature information loss stemming from the binary feature model employed in most current question classification algorithms, in which features are associated with weights in a range of 0~10, rather than simple binary values in the binary feature model. These weights can be used to quantify features contribution to question classification. Features with larger weights should be more contributable to the question classification. The experimental results show that the proposed feature weighting model outperforms the widely used binary feature model in the task of question classification.
|
Published: 01 June 2009
|
|
利用加权特征模型改进问句分类
为了减少目前大多数问句分类算法由于采用了布尔特征模型所导致的特征信息损失,提出了一个基于网络和聚类技术的加权特征模型来表达问句的特征空间.不同于以往采用的布尔特征模型将特征赋值为0或1以表示相应特征出现与否,新的加权特征模型将特征加权为一个位于区间0~10的一个实数,以区分不同的特征对于问句分类的贡献:权值越大,相应特征对于区分问句的类型做出的贡献越大.试验结果表明,该加权特征模型在问句分类领域优于之前被广泛使用的布尔特征模型.
|
|
[1]MOLDOVAN D, PASCA M, HARABAGIU S, et al. Performance issues and error analysis in an open-domain question answering system [C]∥ Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, 2001: 3340.
[2]ZHANG D, LEE W S. Question classification using support vector machines [C]∥ Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Canada: Association for Computational Linguistics, 2003: 2632.
[3]LI X, ROTH D. Learning question classifiers [C]∥ Proceedings of the 19th International Conference on Computational Linguistics. Taiwan: Association for Computational Linguistics, 2002: 556562.
[4]CHEUNG Z, PHAN K L, MAHIDADIA A, et al. Feature extraction for learning to classify questions [C]∥ Proceedings of Advances in Artificial Intelligence. Australia: Springer Berlin/ Heidelberg, 2004: 10691075.
[5]LI X , ROTH D. Learning question classifiers: the role of semantic information [J]. Natural Language Engineering, 2005, 12(3): 229249.
[6]COVER T M, THOMAS J A. Elements of information theory [M]. New York: Wiley-Interscience, 1991.
[7]SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval [J]. Information Processing and Management: an International Journal, 1988, 24(5): 513523.
[8]BUCKLEY C, SALTON G, ALLAN J, et al. Automatic query expansion using SMART: TREC 3 [C]∥ Overview of the Third Text REtrieval Conference (TREC-3). USA: National Institute of Standards and Technology (NIST), 1995: 6980.
[9]DUMAIS S T. Improving the retrieval information from external sources [J]. Behaviour Research Methods, Instruments and Computers, 1991, 23(2): 229236.
[10]MACQUEEN J. Some methods for classification and analysis of multivariate observations [C]∥ Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. California,USA: University of California Press, 1967:281297.
[11]BISHOP C M. Neural networks for pattern recognition [M]. Oxford: Oxford University Press, 1995.
[12]KINDERMANN J, LEOPOLD E, PAASS G. Multi-class classification with error correcting codes [R]. Sankt Augustin: German National Research Center for Information Technology, 2000. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|