Please wait a minute...
J4  2011, Vol. 45 Issue (2): 288-294    DOI: 10.3785/j.issn.1008-973X.2011.02.015
计算机技术     
基于正则化主题建模的隐式产品属性抽取
仇光1, 郑淼1, 张晖2, 朱建科1, 卜佳俊1, 陈纯1, 杭航1
1. 浙江大学 浙江省服务机器人重点实验室 计算机科学与技术学院,浙江 杭州 310027,
2. 浙江工业大学 计算机学院,浙江 杭州 310014
Implicit product feature extraction through regularized topic modeling
QIU Guang1, ZHENG Miao1, ZHANG Hui2, ZHU Jian-ke1,
BU Jia-jun1, CHEN Chun1, HANG Hang1
1. Zhejiang Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou 310027,
China; 2. College of Computer Science, Zhejiang University of Technology, Hangzhou 310014, China
 全文: PDF  HTML
摘要:

 为实现产品意见挖掘中的隐式产品属性抽取,在传统主题建模思想的基础上,通过分析评论信息中不同产品属性对应的意见词分布以及意见词的主题依赖性假设,提出一种基于正则化思想的新主题建模框架.在该框架下,评论信息中的意见词特征,通过定义在不同评论中意见词的使用相似度上的正则化因子,纳入到传统的主题建模框架中.正则化的基本思想为:若2条评论在意见词的使用模式上相似,则它们评论相同的产品属性的概率越高.定性和定量2种实验结果均表明,本文的正则化主题模型较传统的主题模型算法有更高的准确率,说明本文的正则化思想是有效的.

Abstract:

To solve the implicit product feature extraction task in product opinion mining, we proposed a novel regularized topic modeling framework based on the classical topic modeling through the analysis of the distribution of opinion words for different product features in reviews and also the assumption of topic dependency of opinion words. In the new framework, we took into consideration the opinionated information by defining a regularizer based on the similarity in opinion word usage of different reviews. The basic idea of the regularization was that if two reviews were similar in the usage of opinion words, they were more likely to comment on the same features. The qualitative and quantitative experiments both show that the novel framework outperforms classical topic modeling algorithms in accuracy and thus indicate the effectiveness of the regularization.

出版日期: 2011-03-17
:  TP 391.1  
基金资助:

国家科技支撑计划资助项目(2008BAH26B00), 新世纪优秀人才支持计划资助(NCET-09-0685).

通讯作者: 朱建科(1979—),男,讲师.     E-mail: jkzhu@zju.edu.cn
作者简介: 仇光(1983—),男,浙江象山人,博士生,主要从事数据,信息检索方面的研究.E-mail: qiuguang@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

仇光, 郑淼, 张晖, 朱建科, 卜佳俊, 陈纯, 杭航. 基于正则化主题建模的隐式产品属性抽取[J]. J4, 2011, 45(2): 288-294.

QIU Guang, ZHENG Miao, ZHANG Hui, ZHU Jianke, BU Jia-jun, CHEN Chun, HANG Hang. Implicit product feature extraction through regularized topic modeling. J4, 2011, 45(2): 288-294.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2011.02.015        http://www.zjujournals.com/eng/CN/Y2011/V45/I2/288

[1] [1] HU Mingqing, LIU Bing. Mining and summarizing customer reviews [C]∥ Proceedings of SIGKDD’04. Seattle: ACM, 2004: 168-177.
[2] POPESCU A M, ETZIONI O. Extracting product features and opinions from reviews [C]∥ Proceedings of EMNLP’05. Vancouver: ACL, 2005: 339-346.
[3] GAMON M, AUE A, CORSTONOLIVER S, et al. Pulse: mining customer opinions from free text [C]∥ Proceedings of IDA’05. Madrid: Springer, 2005: 121-132.
[4] SCAFFIDI C, BIERHOFF K, CHANG E, et al. Red opal: productfeature scoring from reviews [C]∥ Proceedings of EC’07. California: ACM, 2007: 182-191.
[5] LIU Bing, HU Minqing, CHENG Junsheng. Opinion observer: analyzing and comparing opinions on the Web [C]∥ Proceedings of WWW’05. Chiba: ACM, 2005: 342-351.
[6] SU Qi, XU Xinying, GUO Honglei, et al. Hidden sentiment association in Chinese Web opinion mining [C]∥ Proceedings of WWW’08. Beijing: ACM, 2008: 959-968.
[7] MEI Qiaozhu, LING Xu, WONDRA M, et al. Topic sentiment mixture: modeling facets and opinions in Weblogs [C]∥ Proceedings of WWW’07. Banff: ACM, 2007: 171-180.
[8] HATZIVASSILOGLOU V, MCKEOWN K R. Predicting the semantic orientation of adjectives [C]∥ Proceedings of ACL’97. Madrid: ACL, 1997: 174-181.
[9] KU Lunwei, LIANG Yuting, CHEN Hsinhsi. Opinion extraction, summarization and tracking in news and blog corpora [C]∥AAAI Spring Symposia 2006 on Computational Approaches to Analyzing Weblogs. Boston: AAAI, 2006.
[10] PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques [C]∥ Proceedings of EMNLP’02. Pennsylvania: ACL, 2002: 79-86.
[11] KIM S M, HOVY E. Determining the sentiment of opinions [C]∥Proceedings of COLING’04. Geneva: ACL, 2004: 1367-1373.
[12] SEKI Y. Opinion holder extraction from author and authority viewpoints [C]∥ Proceedings of SIGIR’07. Amsterdam: ACM, 2007: 841-842.
[13] TAN Songbo, ZHANG Jin. An empirical study of sentiment analysis for chinese documents [J]. Expert Systems with Applications, 2007, 34(4): 2622-2629.
[14] TURNEY P. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews [C]∥ Proceedings of ACL’02. Pennsylvania: ACL, 2002: 417-424.
[15] YE Qiang, LIN Bin, LI Yijun. Sentiment classification for Chinese reviews: A comparison between SVM and semantic approaches [C]∥ Proceedings of the 4th international conference on machine learning and cybernetics. Guangzhou: IEEE, 2005.
[16] YE Qiang, SHI Wen, LI Yijun. Sentiment classification for movie reviews in Chinese by improved semantic oriented approach [C]∥ Proceedings of HICSS39. Hawaiian: IEEE, 2006.
[17] 姚天昉,程希文,徐飞玉,等.文本意见挖掘综述[J].中文信息学报,2008,22(3): 71-80.
YAO Tianfang, CHENG Xiwen, XU Feiyu, et al. A survey of opinion mining for texts [J]. Journal of Chinese Information Processing, 2008, 22(3): 71-80.
[18] HOFMANN T. Probabilistic latent semantic analysis [C]∥ Proceedings of UAI’99. California: ACM, 1999: 50-57.
[19] BLEI D M, NG A Y, JORADN M I. Latent dirichlet allocation [J]. The Journal of Machine Learning Research, 2003, 3(3): 993-1022.
[20] CAI Deng, MEI Qiaozhu, HAN Jiawei, et al. Modeling hidden topics on document manifold [C]∥ Proceedings of CIKM’08. California: ACM, 2008: 911-920.
[21] MEI Qiaozhu, CAI Deng, ZHANG Duo, et al. Topic modeling with network regularization [C]∥ Proceedings of WWW’08. Beijing: ACM, 2008: 101-110.
[22] DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the em algorithm [J]. Journal of the Royal Statistical Society. Series B :Methodological, 1977, 39(1): 1-38.
[23] ZHU Xiaojin, GHAHRAMANI Z, LAFFERTY J D. Semisupervised learning using Gaussian fields and harmonic functions [C]∥ Proceedings of ICML’03. Washington: AAAI, 2003: 912-919.
[24] PRESS W H, FLANNERY B P, TEUKELSKY S A, et al. Numerical Recipes in C: the Art of Scientific Computing [M]. London: Cambridge University Press, 1992: 132.
[25] GEHLER P. Peter’s code and dataset page [EB/OL]. \
[2009-06-12\]. http:∥www.kyb.mpg.de/bs/people/pgehler/code/index.html
[26] BLEI D M. Latent dirichlet allocation in C [EB/OL]. \
[2009-06-12\]. http:∥www.cs.princeton.edu/~blei/lda-c/.

[1] 许琦, 顾新建. 一种基于Subject-Action-Object三元组的知识基因提取方法[J]. J4, 2013, 47(3): 385-399.
[2] 姚原岗, 林兰芬, 董金祥. 异质工程文档多维关联的语义检索方法[J]. J4, 2011, 45(2): 267-272.
[3] 仇光,郑淼,卜佳俊,史源,陈纯. 基于传播的产品属性抽取[J]. J4, 2010, 44(11): 2188-2193.