<div style=

doi:10.3785/j.issn.1008-973X.2014.10.018

JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)

Short text expansion and classification based on pseudo-relevance feedback

WANG Meng, LIN Lan-fen, WANG Feng

College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

Download:

PDF(2006KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A novel classification method based on pseudo-relevance feedback (PFR) was proposed in order to solve the sparseness problems in short text classification. The short texts were expanded using the web pages which are similar to them in semantic level. The feature vector generation algorithm was modified to extract both the local features and the global features. The method can alleviate the sparseness problem of the final feature matrix, which is common in short text classification because of the limited length of the texts. The experimental results on an open dataset show that the method can significantly improve the short text classification effect compared with state-of-the-art methods.

Published: 01 October 2014

CLC:

TP 391

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

WANG Meng, LIN Lan-fen, WANG Feng.

Short text expansion and classification based on pseudo-relevance feedback

. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(10): 1835-1842.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2014.10.018 OR http://www.zjujournals.com/eng/Y2014/V48/I10/1835

基于伪相关反馈的短文本扩展与分类

针对短文本分类问题,提出基于伪相关反馈(PFR)的短文本扩展与分类方法.在保持语义不变的情况下,利用互联网中的相似语料对短文本的内容进行了扩展.对现有的仅使用局部特征的扩展语料特征抽取方法进行改进,引入全局特征抽取,将全局特征与局部特征相结合得到了更好的特征向量,有效地解决了分类过程中由短文本长度有限导致的特征矩阵高度稀疏的问题.通过在开放数据集上的测试和与其他文献的结果比对,验证了该方法在短文本分类的问题上可以取得较好的效果.

［1］ SRIRAM B, FUHRY D, DEMIR E, et al. Short text classification in twitter to improve information filtering ［C］∥ Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva: ACM, 2010: 841-842.

［2］ SUN A. Short text classification using very few words ［C］∥ Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Portland: ACM, 2012: 1145-1146.

［3］ YUAN Q, CONG G, THALMANN N M. Enhancing Naive Bayes with various smoothing methods for short text classification ［C］∥ Proceedings of the 21st International Conference on World Wide Web. Seoul: ACM, 2012: 645-646.

［4］李卫疆,赵铁军,王宪刚. 基于上下文的查询扩展［J］.计算机研究与发展,2010,47(2): 300-304.

LI Wei-jiang,ZHAO Tie-jun, WANG Xian-gang. Context-sensitive query expansion ［J］. Journal of Computer Research and Development, 2010, 47(2): 300-304.

［5］ BANERJEE S, RAMANATHAN K, GUPTA A. Clustering short texts using Wikipedia ［C］∥ Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam: ACM, 2007: 787-788.

［6］ HU X, SUN N, ZHANG C, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge ［C］∥ Proceedings of the 18th ACM Conference on Information and Knowledge Management. Hong Kong: ACM, 2009: 919-928.

［7］ PHAN X H, NGUYEN L M, HORIGUCHI S. Learning to classify short and sparse text & web with hidden topics from large-scale data collections ［C］∥ Proceedings of the 17th International Conference on World Wide Web. Beijing: ACM, 2008: 91-100.

［8］ CHEN M, JIN X, SHEN D. Short text classification improved by learning multi-granularity topics ［C］∥ Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Barcelona: AAAI, 2011: 1776-1781.

［9］ SAHAMI M, HEILMAN T D. A web-based kernel function for measuring the similarity of short text snippets ［C］∥ Proceedings of the 15th International Conference on World Wide Web. Edinburgh: ACM, 2006: 377-386.

［10］ YIH W T, CHRISTOPHER M. Improving similarity measures for short segments of text ［C］∥ Proceedings of the 22nd Conference on Artificial Intelligence. Vancouver: AAAI, 2007: 1489-1494.

［11］ BOLLEGALA D, MATSUO Y, ISHIZUKA M. Measuring semantic similarity between words using web search engines ［C］∥ Proceedings of the 16th International Conference on World Wide Web. Banff: ACM, 2007: 757-765.

［12］ EFRON M, ORGANISCIAK P, FENLON K. Improving retrieval of short texts through document expansion ［C］∥ Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Portland: ACM, 2012: 911-920.

［13］ HALL M, FRANK E, HOLMES G, et al. The WEKA data mining software: an update ［J］. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10-18.

[1]	HE Xue-jun, WANG Jin, LU Guo-dong, LIU Zhen-yu, CHEN Li, JIN Jing. 3D head portrait sculpture by industrial robot based on triangular mesh slicing and collision detection[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1104-1110.

[2]	WANG Hua, HAN Tong-yang, ZHOU Ke. KeyGraph-based community detection algorithm for public security intelligence[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1173-1180.

[3]	YOU Hai-hui, MA Zeng-yi, TANG Yi-jun, WANG Yue-lan, ZHENG Lin, YU Zhong, JI Cheng-jun. Soft measurement of heating value of burning municipal solid waste for circulating fluidized bed[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1163-1172.

[4]	BI Xiao-jun, WANG Jia-hui. Teaching-learning-based optimization algorithm with hybrid learning strategy[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 1024-1031.

[5]	WANG Liang, YU Zhi-wen, GUO Bin. Moving trajectory prediction model based on double layer multi-granularity knowledge discovery[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 669-674.

[6]	LIAO Miao, ZHAO Yu-qian, ZENG Ye-zhan, HUANG Zhong-chao, ZHANG Bing-kui, ZOU Bei-ji. Automatic segmentation for cell images based on support vector machine and ellipse fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 722-728.

[7]	HUANG Zheng-yu, JIANG Xin-long, LIU Jun-fa, CHEN Yi-qiang, GU Yang. Fusion feature based semi-supervised manifold localization method[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 655-662.

[8]	JIANG Xin-long, CHEN Yi-qiang, LIU Jun-fa, HU Li-sha, SHEN Jian-fei. Wearable system to support proximity awareness for people with autism[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 637-647.

[9]	MU Jing-jing, ZHAO Xin-yue, HE Zai-xing, ZHANG Shu-you. Contour reconstruction of overlapped bubbles based on concave-convex transformation and circle fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 714-721.

[10]	DAI Cai-yan, CHEN Ling, LI Bin, CHEN Bo-lun. Sampling-based link prediction in complex networks[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 554-561.

[11]	LIU Lei, YANG Peng, LIU Zuo-jun. Locomotion-Mode recognition using multiple kernel relevance vector machine[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 562-571.

[12]	GUO Meng-li, DA Fei-peng, DENG Xing, GAI Shao-yan. 3D face recognition based on keypoints and local feature[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 584-589.

[13]	WANG Hai jun, GE Hong juan, ZHANG Sheng yan. Fast object tracking algorithm via kernel collaborative presentation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 399-407.

[14]	ZHANG Ya nan, CHEN De yun, WANG Ying jie, LIU Yu peng. Incremental graph pattern matching based dynamic recommendation method for cold-start user[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 408-415.

[15]	LIU Yu peng, QIAO Xiu ming, ZHAO Shi lei, MA Chun guang. Deep combination of large-scale features in statistical machine translation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 46-56.

Viewed

Full text

Abstract

Cited

Shared

Discussed