Implicit product feature extraction through regularized topic modeling

doi:10.3785/j.issn.1008-973X.2011.02.015

2011, Vol. 45

Issue (2): 288-294 DOI: 10.3785/j.issn.1008-973X.2011.02.015

Implicit product feature extraction through regularized topic modeling

QIU Guang1, ZHENG Miao1, ZHANG Hui2, ZHU Jian-ke1,
BU Jia-jun1, CHEN Chun1, HANG Hang1

1. Zhejiang Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou 310027,
China; 2. College of Computer Science, Zhejiang University of Technology, Hangzhou 310014, China

Download:

PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

To solve the implicit product feature extraction task in product opinion mining, we proposed a novel regularized topic modeling framework based on the classical topic modeling through the analysis of the distribution of opinion words for different product features in reviews and also the assumption of topic dependency of opinion words. In the new framework, we took into consideration the opinionated information by defining a regularizer based on the similarity in opinion word usage of different reviews. The basic idea of the regularization was that if two reviews were similar in the usage of opinion words, they were more likely to comment on the same features. The qualitative and quantitative experiments both show that the novel framework outperforms classical topic modeling algorithms in accuracy and thus indicate the effectiveness of the regularization.

Published: 17 March 2011

CLC:

TP 391.1

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

QIU Guang, ZHENG Miao, ZHANG Hui, ZHU Jianke, BU Jia-jun, CHEN Chun, HANG Hang. Implicit product feature extraction through regularized topic modeling. J4, 2011, 45(2): 288-294.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2011.02.015 OR http://www.zjujournals.com/eng/Y2011/V45/I2/288

基于正则化主题建模的隐式产品属性抽取

为实现产品意见挖掘中的隐式产品属性抽取,在传统主题建模思想的基础上,通过分析评论信息中不同产品属性对应的意见词分布以及意见词的主题依赖性假设,提出一种基于正则化思想的新主题建模框架.在该框架下,评论信息中的意见词特征,通过定义在不同评论中意见词的使用相似度上的正则化因子,纳入到传统的主题建模框架中.正则化的基本思想为：若2条评论在意见词的使用模式上相似,则它们评论相同的产品属性的概率越高.定性和定量2种实验结果均表明,本文的正则化主题模型较传统的主题模型算法有更高的准确率,说明本文的正则化思想是有效的.

［1］［1］ HU Mingqing, LIU Bing. Mining and summarizing customer reviews ［C］∥ Proceedings of SIGKDD’04. Seattle: ACM, 2004: 168-177.
［2］ POPESCU A M, ETZIONI O. Extracting product features and opinions from reviews ［C］∥ Proceedings of EMNLP’05. Vancouver: ACL, 2005: 339-346.
［3］ GAMON M, AUE A, CORSTONOLIVER S, et al. Pulse: mining customer opinions from free text ［C］∥ Proceedings of IDA’05. Madrid: Springer, 2005: 121-132.
［4］ SCAFFIDI C, BIERHOFF K, CHANG E, et al. Red opal: productfeature scoring from reviews ［C］∥ Proceedings of EC’07. California: ACM, 2007: 182-191.
［5］ LIU Bing, HU Minqing, CHENG Junsheng. Opinion observer: analyzing and comparing opinions on the Web ［C］∥ Proceedings of WWW’05. Chiba: ACM, 2005: 342-351.
［6］ SU Qi, XU Xinying, GUO Honglei, et al. Hidden sentiment association in Chinese Web opinion mining ［C］∥ Proceedings of WWW’08. Beijing: ACM, 2008: 959-968.
［7］ MEI Qiaozhu, LING Xu, WONDRA M, et al. Topic sentiment mixture: modeling facets and opinions in Weblogs ［C］∥ Proceedings of WWW’07. Banff: ACM, 2007: 171-180.
［8］ HATZIVASSILOGLOU V, MCKEOWN K R. Predicting the semantic orientation of adjectives ［C］∥ Proceedings of ACL’97. Madrid: ACL, 1997: 174-181.
［9］ KU Lunwei, LIANG Yuting, CHEN Hsinhsi. Opinion extraction, summarization and tracking in news and blog corpora ［C］∥AAAI Spring Symposia 2006 on Computational Approaches to Analyzing Weblogs. Boston: AAAI, 2006.
［10］ PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up？ Sentiment classification using machine learning techniques ［C］∥ Proceedings of EMNLP’02. Pennsylvania: ACL, 2002: 79-86.
［11］ KIM S M, HOVY E. Determining the sentiment of opinions ［C］∥Proceedings of COLING’04. Geneva: ACL, 2004: 1367-1373.
［12］ SEKI Y. Opinion holder extraction from author and authority viewpoints ［C］∥ Proceedings of SIGIR’07. Amsterdam: ACM, 2007: 841-842.
［13］ TAN Songbo, ZHANG Jin. An empirical study of sentiment analysis for chinese documents ［J］. Expert Systems with Applications, 2007, 34(4): 2622-2629.
［14］ TURNEY P. Thumbs up or thumbs down？ semantic orientation applied to unsupervised classification of reviews ［C］∥ Proceedings of ACL’02. Pennsylvania: ACL, 2002: 417-424.
［15］ YE Qiang, LIN Bin, LI Yijun. Sentiment classification for Chinese reviews: A comparison between SVM and semantic approaches ［C］∥ Proceedings of the 4th international conference on machine learning and cybernetics. Guangzhou: IEEE, 2005.
［16］ YE Qiang, SHI Wen, LI Yijun. Sentiment classification for movie reviews in Chinese by improved semantic oriented approach ［C］∥ Proceedings of HICSS39. Hawaiian: IEEE, 2006.
［17］姚天昉,程希文,徐飞玉,等.文本意见挖掘综述［J］.中文信息学报,2008,22(3): 71-80.
YAO Tianfang, CHENG Xiwen, XU Feiyu, et al. A survey of opinion mining for texts ［J］. Journal of Chinese Information Processing, 2008, 22(3): 71-80.
［18］ HOFMANN T. Probabilistic latent semantic analysis ［C］∥ Proceedings of UAI’99. California: ACM, 1999: 50-57.
［19］ BLEI D M, NG A Y, JORADN M I. Latent dirichlet allocation ［J］. The Journal of Machine Learning Research, 2003, 3(3): 993-1022.
［20］ CAI Deng, MEI Qiaozhu, HAN Jiawei, et al. Modeling hidden topics on document manifold ［C］∥ Proceedings of CIKM’08. California: ACM, 2008: 911-920.
［21］ MEI Qiaozhu, CAI Deng, ZHANG Duo, et al. Topic modeling with network regularization ［C］∥ Proceedings of WWW’08. Beijing: ACM, 2008: 101-110.
［22］ DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the em algorithm ［J］. Journal of the Royal Statistical Society. Series B ：Methodological, 1977, 39(1): 1-38.
［23］ ZHU Xiaojin, GHAHRAMANI Z, LAFFERTY J D. Semisupervised learning using Gaussian fields and harmonic functions ［C］∥ Proceedings of ICML’03. Washington: AAAI, 2003: 912-919.
［24］ PRESS W H, FLANNERY B P, TEUKELSKY S A, et al. Numerical Recipes in C: the Art of Scientific Computing ［M］. London: Cambridge University Press, 1992: 132.
［25］ GEHLER P. Peter’s code and dataset page ［EB/OL］. \
[2009-06-12\]. http:∥www.kyb.mpg.de/bs/people/pgehler/code/index.html
［26］ BLEI D M. Latent dirichlet allocation in C ［EB/OL］. \
[2009-06-12\]. http:∥www.cs.princeton.edu/～blei/lda-c/.

[1]	XU Qi, GU Xin-jian. Subject-action-object-triples-based method for extraction of knowledge gene[J]. J4, 2013, 47(3): 385-399.

[2]	YAO Yuan-gang, LIN Lan-fen, DONG Jin-xiang. Approach for multi-dimensional associated heterogeneous engineering document semantic retrieval[J]. J4, 2011, 45(2): 267-272.

[3]	QIU Guang, ZHENG Miao, BU Jia-jun, SHI Yuan, CHEN Chun. Propagation based product feature extraction[J]. J4, 2010, 44(11): 2188-2193.

Viewed

Full text

Abstract

Cited

Shared

Discussed