Microblog sentiment analysis based on collaborative learning under loose conditions
SUN Nian1, LI Yu-qiang1, LIU Ai-hua2, LIU Chun1, LI Wei-wei1
1. School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China;
2. School of Energy and Power Engineering, Wuhan University of Technology, Wuhan 430063, China
Aiming at the facts that two completely redundant feature views are required in the traditional collaborative learning and the redundancy of features are not reached in most cases, a collaborative learning framework under loose conditions was proposed. The support vector machine algorithm and the long short-term memory algorithm were used to establish the microblog feature view based on the vector model and the word vector model. The collaborative learning was conducted on these two models. A new selection strategy of unmarked samples which combined the uncertain strategy in active learning and the maximum certainty-factor was proposed. The information contained in unlabeled samples was fully used. The experimental results show that compared with the traditional selection strategy, the selection strategy improves the quality of categorizer and manages to complete Chinese microblog sentiment analysis with the proposed collaborative learning framework under loose conditions.
Received: 14 November 2017
Published: 23 August 2018
SUN Nian, LI Yu-qiang, LIU Ai-hua, LIU Chun, LI Wei-wei. Microblog sentiment analysis based on collaborative learning under loose conditions. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2018, 52(8): 1452-1460.
[1] HERLOCKER J L, KONSTAN J A, TERVEEN L G, et al. Evaluating collaborative filtering recommender system[J]. ACM Transactions on Information Systems, 2004, 22(1):5-53.
[2] 徐蕾, 杨成, 姜春晓, 等. 协同过滤推荐系统中的用户博弈[J]. 计算机学报, 2016, 39(6):1176-1189 XU Lei, YANG Cheng, JIANG Chun-xiao et al. The user game in the collaborative filtering recommendation system[J]. Chinese Journal of Computers, 2016, 39(6):1176-1189
[3] 梁军, 柴玉梅, 原慧斌, 等. 基于深度学习的微博情感分析[J]. 中文信息学报, 2014, 28(5):155-161 LIANG Jun, CHAI Yu-mei, YUAN Hui-bin, et al. The analysis of microblog sentiment based on deep learn ing[J]. Journal of Chinese Information Processing, 2014, 28(5):155-161
[4] ZHENG C, SHENG L, DAI N. Chinese microblog emotion classification based on class sequential rules[J]. Computer Engineering, 2016, 42(2):184-189.
[5] CHANG Y C, CHU C H, CHEN C, et al. Linguistic template extraction for recognizing reader-emotion[J]. Journal of Chinese Computational Linguistics, 2016, 21(1):29-50.
[6] 唐慧丰, 谭松波, 程学旗. 基于监督学习的中文情感分类技术比较研究[J]. 中文信息学报, 2007, 6(2):88-94 TANG Hui-feng, TAN Song-bo, CHENG Xue-qi. Research on sentiment classification of Chinese reviews based on supervised machine learning techniques[J]. Journal of Chinese Information Processing, 2007, 6(2):88-94
[7] BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]//Conference on Computational Learning Theory. Madison:COLT, 1998:92-100
[8] GOLDMAN S, ZHOU Y. Enhancing supervised learning with unlabeled data[C]//Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco:ICML, 2000:327-334
[9] ZHOU Z, LI M. Tri-training:exploiting unlabeled data using three classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11):1529-1541.
[10] WANG W, ZHOU Z H. On multi-view active learning and the combination with semi-supervised learning[C]//Proceedings of the Twenty-five International Conference on Machine Learling. Helsinki:DBLP, 2008:1152-1159
[11] YU N. Exploring co-training strategies for opinion detection[J]. Journal of the Association for Information Science and Technology, 2014, 65(10):2098-2110.
[12] LEWIS D D, GALE W A. A sequential algorithm for training text classifiers[C]//International Conference on Computational Linguistics. Dublin:ICCL, 1994:3-12
[13] 居胜峰, 王中卿, 李寿山等. 情感分类中不同主动学习策略比较研究[C]//中国计算语言学研究前沿进展. 洛阳:CCL, 2011:506-511 JU Sheng-feng, WANG Zhong-qing, LI Shou-shan, et al. A comparative study of different active learning strategies for sentiment classification[C]//Advances of Computational Linguistics in China. Luoyang:CCL, 2011:506-511
[14] NGUYEN H T, SMEULDERS A. Active learning using pre-clustering[C]//2004 Proceedings of the Twenty-first International Conference on Machine Learning. Banff:ICML, 2004:79
[15] HAJMOHAMMADI M S, IBRAHIM R, SELAMAT A, et al. Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabeled samples[J]. Information Sciences, 2015, 317(C):67-77.
[16] HASTIE T, TIBSHIRANI R, FRIEDMAN J H, et al. The elements of statistical learning, second edition:data mining, inference, and prediction[J]. Mathematical Intelligencer, 2009, 27(2):83-85.
[17] LI W, LI Y, WANG Y. Chinese microblog sentiment analysis based on sentiment features[C]//Asia-Pacific Web Conference. Suzhou:APWeb, 2016:385-388
WEI Xiao-feng, CHENG Cheng-qi, CHEN Bo, WANG Hai-yan. Chain code based on independent edge number[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2018, 52(9): 1686-1693.