Please wait a minute...
J4  2011, Vol. 45 Issue (2): 288-294    DOI: 10.3785/j.issn.1008-973X.2011.02.015
Implicit product feature extraction through regularized topic modeling
QIU Guang1, ZHENG Miao1, ZHANG Hui2, ZHU Jian-ke1,
BU Jia-jun1, CHEN Chun1, HANG Hang1
1. Zhejiang Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou 310027,
China; 2. College of Computer Science, Zhejiang University of Technology, Hangzhou 310014, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      


To solve the implicit product feature extraction task in product opinion mining, we proposed a novel regularized topic modeling framework based on the classical topic modeling through the analysis of the distribution of opinion words for different product features in reviews and also the assumption of topic dependency of opinion words. In the new framework, we took into consideration the opinionated information by defining a regularizer based on the similarity in opinion word usage of different reviews. The basic idea of the regularization was that if two reviews were similar in the usage of opinion words, they were more likely to comment on the same features. The qualitative and quantitative experiments both show that the novel framework outperforms classical topic modeling algorithms in accuracy and thus indicate the effectiveness of the regularization.

Published: 17 March 2011
CLC:  TP 391.1  
Cite this article:

QIU Guang, ZHENG Miao, ZHANG Hui, ZHU Jianke, BU Jia-jun, CHEN Chun, HANG Hang. Implicit product feature extraction through regularized topic modeling. J4, 2011, 45(2): 288-294.

URL:     OR



[1] [1] HU Mingqing, LIU Bing. Mining and summarizing customer reviews [C]∥ Proceedings of SIGKDD’04. Seattle: ACM, 2004: 168-177.
[2] POPESCU A M, ETZIONI O. Extracting product features and opinions from reviews [C]∥ Proceedings of EMNLP’05. Vancouver: ACL, 2005: 339-346.
[3] GAMON M, AUE A, CORSTONOLIVER S, et al. Pulse: mining customer opinions from free text [C]∥ Proceedings of IDA’05. Madrid: Springer, 2005: 121-132.
[4] SCAFFIDI C, BIERHOFF K, CHANG E, et al. Red opal: productfeature scoring from reviews [C]∥ Proceedings of EC’07. California: ACM, 2007: 182-191.
[5] LIU Bing, HU Minqing, CHENG Junsheng. Opinion observer: analyzing and comparing opinions on the Web [C]∥ Proceedings of WWW’05. Chiba: ACM, 2005: 342-351.
[6] SU Qi, XU Xinying, GUO Honglei, et al. Hidden sentiment association in Chinese Web opinion mining [C]∥ Proceedings of WWW’08. Beijing: ACM, 2008: 959-968.
[7] MEI Qiaozhu, LING Xu, WONDRA M, et al. Topic sentiment mixture: modeling facets and opinions in Weblogs [C]∥ Proceedings of WWW’07. Banff: ACM, 2007: 171-180.
[8] HATZIVASSILOGLOU V, MCKEOWN K R. Predicting the semantic orientation of adjectives [C]∥ Proceedings of ACL’97. Madrid: ACL, 1997: 174-181.
[9] KU Lunwei, LIANG Yuting, CHEN Hsinhsi. Opinion extraction, summarization and tracking in news and blog corpora [C]∥AAAI Spring Symposia 2006 on Computational Approaches to Analyzing Weblogs. Boston: AAAI, 2006.
[10] PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques [C]∥ Proceedings of EMNLP’02. Pennsylvania: ACL, 2002: 79-86.
[11] KIM S M, HOVY E. Determining the sentiment of opinions [C]∥Proceedings of COLING’04. Geneva: ACL, 2004: 1367-1373.
[12] SEKI Y. Opinion holder extraction from author and authority viewpoints [C]∥ Proceedings of SIGIR’07. Amsterdam: ACM, 2007: 841-842.
[13] TAN Songbo, ZHANG Jin. An empirical study of sentiment analysis for chinese documents [J]. Expert Systems with Applications, 2007, 34(4): 2622-2629.
[14] TURNEY P. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews [C]∥ Proceedings of ACL’02. Pennsylvania: ACL, 2002: 417-424.
[15] YE Qiang, LIN Bin, LI Yijun. Sentiment classification for Chinese reviews: A comparison between SVM and semantic approaches [C]∥ Proceedings of the 4th international conference on machine learning and cybernetics. Guangzhou: IEEE, 2005.
[16] YE Qiang, SHI Wen, LI Yijun. Sentiment classification for movie reviews in Chinese by improved semantic oriented approach [C]∥ Proceedings of HICSS39. Hawaiian: IEEE, 2006.
[17] 姚天昉,程希文,徐飞玉,等.文本意见挖掘综述[J].中文信息学报,2008,22(3): 71-80.
YAO Tianfang, CHENG Xiwen, XU Feiyu, et al. A survey of opinion mining for texts [J]. Journal of Chinese Information Processing, 2008, 22(3): 71-80.
[18] HOFMANN T. Probabilistic latent semantic analysis [C]∥ Proceedings of UAI’99. California: ACM, 1999: 50-57.
[19] BLEI D M, NG A Y, JORADN M I. Latent dirichlet allocation [J]. The Journal of Machine Learning Research, 2003, 3(3): 993-1022.
[20] CAI Deng, MEI Qiaozhu, HAN Jiawei, et al. Modeling hidden topics on document manifold [C]∥ Proceedings of CIKM’08. California: ACM, 2008: 911-920.
[21] MEI Qiaozhu, CAI Deng, ZHANG Duo, et al. Topic modeling with network regularization [C]∥ Proceedings of WWW’08. Beijing: ACM, 2008: 101-110.
[22] DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the em algorithm [J]. Journal of the Royal Statistical Society. Series B :Methodological, 1977, 39(1): 1-38.
[23] ZHU Xiaojin, GHAHRAMANI Z, LAFFERTY J D. Semisupervised learning using Gaussian fields and harmonic functions [C]∥ Proceedings of ICML’03. Washington: AAAI, 2003: 912-919.
[24] PRESS W H, FLANNERY B P, TEUKELSKY S A, et al. Numerical Recipes in C: the Art of Scientific Computing [M]. London: Cambridge University Press, 1992: 132.
[25] GEHLER P. Peter’s code and dataset page [EB/OL]. \
[2009-06-12\]. http:∥
[26] BLEI D M. Latent dirichlet allocation in C [EB/OL]. \
[2009-06-12\]. http:∥

[1] XU Qi, GU Xin-jian. Subject-action-object-triples-based method  for extraction of knowledge gene[J]. J4, 2013, 47(3): 385-399.
[2] YAO Yuan-gang, LIN Lan-fen, DONG Jin-xiang. Approach for multi-dimensional associated heterogeneous
engineering document semantic retrieval
[J]. J4, 2011, 45(2): 267-272.
[3] QIU Guang, ZHENG Miao, BU Jia-jun, SHI Yuan, CHEN Chun. Propagation based product feature extraction[J]. J4, 2010, 44(11): 2188-2193.