Short text expansion and classification based on pseudo-relevance feedback" />
Short text expansion and classification based on pseudo-relevance feedback" />
Short text expansion and classification based on pseudo-relevance feedback" />
基于伪相关反馈的短文本扩展与分类
A novel classification method based on pseudo-relevance feedback (PFR) was proposed in order to solve the sparseness problems in short text classification. The short texts were expanded using the web pages which are similar to them in semantic level. The feature vector generation algorithm was modified to extract both the local features and the global features. The method can alleviate the sparseness problem of the final feature matrix, which is common in short text classification because of the limited length of the texts. The experimental results on an open dataset show that the method can significantly improve the short text classification effect compared with state-of-the-art methods.
[1] SRIRAM B, FUHRY D, DEMIR E, et al. Short text classification in twitter to improve information filtering [C]∥ Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva: ACM, 2010: 841-842.
[2] SUN A. Short text classification using very few words [C]∥ Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Portland: ACM, 2012: 1145-1146.
[3] YUAN Q, CONG G, THALMANN N M. Enhancing Naive Bayes with various smoothing methods for short text classification [C]∥ Proceedings of the 21st International Conference on World Wide Web. Seoul: ACM, 2012: 645-646.
LI Wei-jiang,ZHAO Tie-jun, WANG Xian-gang. Context-sensitive query expansion [J]. Journal of Computer Research and Development, 2010, 47(2): 300-304.
[5] BANERJEE S, RAMANATHAN K, GUPTA A. Clustering short texts using Wikipedia [C]∥ Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam: ACM, 2007: 787-788.
[6] HU X, SUN N, ZHANG C, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge [C]∥ Proceedings of the 18th ACM Conference on Information and Knowledge Management. Hong Kong: ACM, 2009: 919-928.
[7] PHAN X H, NGUYEN L M, HORIGUCHI S. Learning to classify short and sparse text & web with hidden topics from large-scale data collections [C]∥ Proceedings of the 17th International Conference on World Wide Web. Beijing: ACM, 2008: 91-100.
[8] CHEN M, JIN X, SHEN D. Short text classification improved by learning multi-granularity topics [C]∥ Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Barcelona: AAAI, 2011: 1776-1781.
[9] SAHAMI M, HEILMAN T D. A web-based kernel function for measuring the similarity of short text snippets [C]∥ Proceedings of the 15th International Conference on World Wide Web. Edinburgh: ACM, 2006: 377-386.
[10] YIH W T, CHRISTOPHER M. Improving similarity measures for short segments of text [C]∥ Proceedings of the 22nd Conference on Artificial Intelligence. Vancouver: AAAI, 2007: 1489-1494.
[11] BOLLEGALA D, MATSUO Y, ISHIZUKA M. Measuring semantic similarity between words using web search engines [C]∥ Proceedings of the 16th International Conference on World Wide Web. Banff: ACM, 2007: 757-765.
[12] EFRON M, ORGANISCIAK P, FENLON K. Improving retrieval of short texts through document expansion [C]∥ Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Portland: ACM, 2012: 911-920.
[13] HALL M, FRANK E, HOLMES G, et al. The WEKA data mining software: an update [J]. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10-18.