Short text expansion and classification based on pseudo-relevance feedback" /> 基于伪相关反馈的短文本扩展与分类
Please wait a minute...
浙江大学学报(工学版)
优秀论文推荐     
基于伪相关反馈的短文本扩展与分类
王蒙, 林兰芬, 王锋
浙江大学 计算机科学与技术学院,浙江 杭州 310027
Short text expansion and classification based on pseudo-relevance feedback
WANG Meng, LIN Lan-fen, WANG Feng
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
 全文: PDF(1851 KB)  
摘要:
针对短文本分类问题,提出基于伪相关反馈(PFR)的短文本扩展与分类方法.在保持语义不变的情况下,利用互联网中的相似语料对短文本的内容进行了扩展.对现有的仅使用局部特征的扩展语料特征抽取方法进行改进,引入全局特征抽取,将全局特征与局部特征相结合得到了更好的特征向量,有效地解决了分类过程中由短文本长度有限导致的特征矩阵高度稀疏的问题.通过在开放数据集上的测试和与其他文献的结果比对,验证了该方法在短文本分类的问题上可以取得较好的效果.
关键词: 伪相关反馈特征提取短文本分类    
Abstract:

A novel classification method based on pseudo-relevance feedback (PFR) was proposed in order to solve the sparseness problems in short text classification. The short texts were expanded using the web pages which are similar to them in semantic level. The feature vector generation algorithm was modified to extract both the local features and the global features. The method can alleviate the sparseness problem of the final feature matrix, which is common in short text classification because of the limited length of the texts. The experimental results on an open dataset show that the method can significantly improve the short text classification effect compared with state-of-the-art methods.

Key words: short text classification    feature extraction    pseudo-relevance feedback
出版日期: 2014-06-12
:  TP 391  
基金资助:

 博士点基金资助项目(20110101110065);国家“十二五”科技支撑计划资助项目(2012BAD35B01-3,2013BAF02B10).

通讯作者: 林兰芬,女,教授,博导.     E-mail: llf@zju.edu.cn
作者简介: 王蒙(1986 —),男,博士生,从事自然语言处理和数据挖掘的研究. E-mail: wangmeng@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
王锋
林兰芬
王蒙

引用本文:

王蒙, 林兰芬, 王锋. 基于伪相关反馈的短文本扩展与分类[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2014.10.000.

WANG Meng, LIN Lan-fen, WANG Feng.

Short text expansion and classification based on pseudo-relevance feedback
. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2014.10.000.

链接本文:

http://www.zjujournals.com/xueshu/eng/CN/10.3785/j.issn.1008-973X.2014.10.000        http://www.zjujournals.com/xueshu/eng/CN/Y2014/V48/I5/2

[1] 张传强,洪慧,金红光. 聚光式太阳能热发电技术发展状况[J]. 热力发电, 2010, 39(12): 5-9.
ZHANG Chuan-qiang, HONG Hui, JIN Hong-guang. Development situation of power generation technology using heat of light-concentrating solar energy [J].Thermal Power Generation, 2010, 39(12): 5-9.
[2] 章国芳,朱天宇,王希晨. 塔式太阳能热发电技术进展及在我国的应用前景[J]. 太阳能, 2008, 29 (11): 33-37.
ZHANG Guo-fang, ZHU Tian-yu, WANG Xi-chen. Development of solar power tower technology and its prospect in China [J]. Solar Energy, 2008, 29 (11): 33-37.
[3] 高维,徐蕙,徐二树,等.塔式太阳能热发电吸热器运行安全性研究[J].中国电机工程学报, 2013, 33(2): 92-97.
GAO Wei, XU Hui, XU Er-shu, et al. Research on operation security of solar thermal tower power plant receiver [J]. Proceedings of the CSEE, 2013, 33(2): 92-97.
[4] GONZALEZ M M, PALAFOX H J, CLAUDIO E A. Numerical study of heat transfer by natural convection and surface thermal radiation in an open cavity receiver [J]. Solar Energy, 2012, 86(4): 1118-1128.
[5] GARBRECHT O, SIBAI A F, KNEER R, et al. CFD-simulation of a new receiver design for a molten salt solar power tower [J]. Solar Energy, 2013, 90(1): 94-106.
[6] XU E, YU Q, WANG Z, et al. Modeling and simulation of 1 MW DAHAN solar thermal power tower plant [J]. Renewable Energy, 2011, 36(2): 848-857.
[7] 方嘉宾,魏进家,董训伟,等. 腔式太阳能吸热器热性能的模拟计算[J]. 工程热物理学报, 2009, 30(3): 428432.
FANG Jia-bin,WEI Jin-jia,DONG Xun-wei,et al. Performance simulation of solar cavity receiver [J]. Journal of Engineering Thermophysics, 2009, 30(3): 428-432.
[8] LU Jian-feng, DING Jing, YANG Jian-ping. Heat transfer performance of an external receiver pipe under unilateral concentrated solar radiation [J]. Solar Energy, 2010, 84(11): 1879-1887.
[9] 杨敏林,杨晓西,丁静,等. 半周加热半周绝热的熔盐吸热管传热特性研究[J]. 太阳能学报, 2009, 30(8): 10071012.
YANG Min-lin,YANG Xiao-xi,DING Jing,et al. Heat transfer research on molten salt receiver with semi-circumference heat [J]. Acta Energiae Solaris Sinica, 2009, 30(8): 1007-1012.
[10] 杜景龙,唐大伟,李铁. 5kW聚光型太阳模拟器加热特性的实验研究[J]. 太阳能学报, 2012, 33(4): 625-629.
DU Jing-long,TANG Da-wei,LI Tie. Experiment study of the heating characteristics of 5kW focused solar simulator [J]. Acta Energiae Solaris Sinica, 2012, 33(4): 625-629.
[11] 刘志刚,张春平,赵耀华,等. 一种新型腔式吸热器的设计与实验研究[J]. 太阳能学报, 2005, 26(3): 3843.
LIU Zhi-gang, ZHANG Chun-ping, ZHAO Yao-hua, et al. The design and experiments of a new cavity absorber [J]. Acta Energiae Solaris Sinica, 2005, 26(3): 38-43.
[12] PRAKASH M, KEDARE S B, NAYAK J K. Investigations on heat losses from a solar cavity receiver [J]. Solar Energy, 2009, 83(2): 157-170.
[13] WU S, XIAO L, CAO Y, et al. Convection heat loss from cavity receiver in parabolic dish solar thermal power system: a review [J]. Solar Energy, 2010, 84(8): 1342-1355.
[14] SIEBERS D L, KRAABEL J S. Estimating convective energy losses from solar central receivers [R]. Livermore: Sandia National Labs, 1984.
[15] LI X, KONG W, WANG Z, et al. Thermal model and thermodynamic performance of molten salt cavity receiver [J]. Renewable Energy, 2010, 35(5): 981-988.
[16] XIAO G, GUO K, LUO Z, et al. Simulation and experimental study on a spiral solid particle solar receiver [J]. Applied Energy, 2014, 113(01): 178188.
[17] REDDY K S, KUMAR S N. Combined laminar natural convection and surface radiation heat transfer in a modified cavity receiver of solar parabolic dish [J]. International Journal of Thermal Sciences, 2008, 47(12): 1647-1657.
[1] 谭海龙, 刘康玲, 金鑫, 石向荣, 梁军. 基于μσ-DWC特征和树结构M-SVM的多维时间序列分类[J]. 浙江大学学报(工学版), 2015, 49(6): 1061-1069.
[2] 白帆, 郑慧峰, 沈平平, 王成, 喻桑桑. 基于花朵特征编码归类的植物种类识别方法[J]. 浙江大学学报(工学版), 2015, 49(10): 1902-1908.
[3] 王蒙, 林兰芬, 王锋. 基于伪相关反馈的短文本扩展与分类[J]. 浙江大学学报(工学版), 2014, 48(10): 1835-1842.
[4] 施锦河, 沈继忠, 王攀. 四类运动想象脑电信号特征提取与分类算法[J]. J4, 2012, 46(2): 338-344.
[5] 周水琴, 应义斌, 商德胜. 基于形态学的香梨褐变核磁共振成像无损检测[J]. J4, 2012, 46(12): 2141-2145.
[6] 许雪梅, 李丽娴, 张键洋, 倪兰, 黄征宇, 曹建. 透明液体药剂中可见异物跟踪算法[J]. J4, 2012, 46(10): 1822-1830.
[7] 刘晨彬,潘颖,张海石,黄峰平,夏顺仁. 基于磁共振图像的脑瘤MGMT表达状况检测算法[J]. J4, 2012, 46(1): 170-176.
[8] 李帷韬, 周晓杰, 柴天佑. 基于Gabor滤波器和潜在语义分析的烧成状态识别[J]. J4, 2011, 45(12): 2120-2126.
[9] 车红昆, 吕福在, 项占琴. 基于顺序向前浮动搜索时频优选特征的缺陷识别[J]. J4, 2011, 45(12): 2235-2239.
[10] 赵立杰, 汤健, 柴天佑. 基于选择性极限学习机集成的磨机负荷软测量[J]. J4, 2011, 45(12): 2088-2092.
[11] 沈路, 周晓军, 张文斌, 张志刚. 形态解调在齿轮故障特征提取中的应用[J]. J4, 2010, 44(8): 1514-1518.
[12] 汤健, 赵立杰, 岳恒, 柴天佑. 基于多源数据特征融合的球磨机负荷软测量[J]. J4, 2010, 44(7): 1406-1413.
[13] 祝志博, 王培良, 宋执环. 基于PCASVDD的故障检测和自学习辨识[J]. J4, 2010, 44(4): 652-658.
[14] 杨先勇,周晓军,张文斌,杨富春,林勇. 基于形态小波和S变换的滚动轴承故障特征提取[J]. J4, 2010, 44(11): 2088-2092.
[15] 张宝军, 潘雪增, 王界兵, 等. 基于多代理的混合式入侵检测系统模型[J]. J4, 2009, 43(6): 987-993.