Please wait a minute...
浙江大学学报(工学版)  2018, Vol. 52 Issue (8): 1452-1460    DOI: 10.3785/j.issn.1008-973X.2018.08.003
计算机技术     
基于松散条件下协同学习的中文微博情感分析
孙念1, 李玉强1, 刘爱华2, 刘春1, 黎威威1
1. 武汉理工大学 计算机科学与技术学院, 湖北 武汉 430063;
2. 武汉理工大学 能源与动力工程学院, 湖北 武汉 430063
Microblog sentiment analysis based on collaborative learning under loose conditions
SUN Nian1, LI Yu-qiang1, LIU Ai-hua2, LIU Chun1, LI Wei-wei1
1. School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China;
2. School of Energy and Power Engineering, Wuhan University of Technology, Wuhan 430063, China
 全文: PDF(1025 KB)   HTML
摘要:

传统的协同学习算法需要2个充分冗余的特征视图,而在多数情况下达不到特征充分冗余的要求,为此提出松散条件下的协同学习框架.利用支持向量机算法和长短期记忆网络(LSTM)算法分别建立基于向量空间模型的微博特征视图和基于语义相关的词向量特征视图,在2个视图上进行协同学习.针对未标注样本的选择,提出结合主动学习中的不确定策略和协同学习中的最高置信度策略的选择策略,从不同角度充分利用未标注样本中包含的信息量.实验结果表明,在中文微博情感极性研究领域,提出的选择策略与传统选择策略相比,能够提高分类器的性能,并且利用松散条件下的协同学习框架实现微博情感分析性能.

Abstract:

Aiming at the facts that two completely redundant feature views are required in the traditional collaborative learning and the redundancy of features are not reached in most cases, a collaborative learning framework under loose conditions was proposed. The support vector machine algorithm and the long short-term memory algorithm were used to establish the microblog feature view based on the vector model and the word vector model. The collaborative learning was conducted on these two models. A new selection strategy of unmarked samples which combined the uncertain strategy in active learning and the maximum certainty-factor was proposed. The information contained in unlabeled samples was fully used. The experimental results show that compared with the traditional selection strategy, the selection strategy improves the quality of categorizer and manages to complete Chinese microblog sentiment analysis with the proposed collaborative learning framework under loose conditions.

收稿日期: 2017-11-14 出版日期: 2018-08-23
CLC:  TP391  
通讯作者: 李玉强,男,副教授.orcid.org/0000-0002-9977-8001.     E-mail: liyuqiang@whut.edu.cn
作者简介: 孙念(1994-),女,硕士生,从事机器学习与大数据的研究.orcid.org/0000-0003-4311-8933.E-mail:853128958@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  

引用本文:

孙念, 李玉强, 刘爱华, 刘春, 黎威威. 基于松散条件下协同学习的中文微博情感分析[J]. 浙江大学学报(工学版), 2018, 52(8): 1452-1460.

SUN Nian, LI Yu-qiang, LIU Ai-hua, LIU Chun, LI Wei-wei. Microblog sentiment analysis based on collaborative learning under loose conditions. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2018, 52(8): 1452-1460.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2018.08.003        http://www.zjujournals.com/eng/CN/Y2018/V52/I8/1452

[1] HERLOCKER J L, KONSTAN J A, TERVEEN L G, et al. Evaluating collaborative filtering recommender system[J]. ACM Transactions on Information Systems, 2004, 22(1):5-53.
[2] 徐蕾, 杨成, 姜春晓, 等. 协同过滤推荐系统中的用户博弈[J]. 计算机学报, 2016, 39(6):1176-1189 XU Lei, YANG Cheng, JIANG Chun-xiao et al. The user game in the collaborative filtering recommendation system[J]. Chinese Journal of Computers, 2016, 39(6):1176-1189
[3] 梁军, 柴玉梅, 原慧斌, 等. 基于深度学习的微博情感分析[J]. 中文信息学报, 2014, 28(5):155-161 LIANG Jun, CHAI Yu-mei, YUAN Hui-bin, et al. The analysis of microblog sentiment based on deep learn ing[J]. Journal of Chinese Information Processing, 2014, 28(5):155-161
[4] ZHENG C, SHENG L, DAI N. Chinese microblog emotion classification based on class sequential rules[J]. Computer Engineering, 2016, 42(2):184-189.
[5] CHANG Y C, CHU C H, CHEN C, et al. Linguistic template extraction for recognizing reader-emotion[J]. Journal of Chinese Computational Linguistics, 2016, 21(1):29-50.
[6] 唐慧丰, 谭松波, 程学旗. 基于监督学习的中文情感分类技术比较研究[J]. 中文信息学报, 2007, 6(2):88-94 TANG Hui-feng, TAN Song-bo, CHENG Xue-qi. Research on sentiment classification of Chinese reviews based on supervised machine learning techniques[J]. Journal of Chinese Information Processing, 2007, 6(2):88-94
[7] BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]//Conference on Computational Learning Theory. Madison:COLT, 1998:92-100
[8] GOLDMAN S, ZHOU Y. Enhancing supervised learning with unlabeled data[C]//Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco:ICML, 2000:327-334
[9] ZHOU Z, LI M. Tri-training:exploiting unlabeled data using three classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11):1529-1541.
[10] WANG W, ZHOU Z H. On multi-view active learning and the combination with semi-supervised learning[C]//Proceedings of the Twenty-five International Conference on Machine Learling. Helsinki:DBLP, 2008:1152-1159
[11] YU N. Exploring co-training strategies for opinion detection[J]. Journal of the Association for Information Science and Technology, 2014, 65(10):2098-2110.
[12] LEWIS D D, GALE W A. A sequential algorithm for training text classifiers[C]//International Conference on Computational Linguistics. Dublin:ICCL, 1994:3-12
[13] 居胜峰, 王中卿, 李寿山等. 情感分类中不同主动学习策略比较研究[C]//中国计算语言学研究前沿进展. 洛阳:CCL, 2011:506-511 JU Sheng-feng, WANG Zhong-qing, LI Shou-shan, et al. A comparative study of different active learning strategies for sentiment classification[C]//Advances of Computational Linguistics in China. Luoyang:CCL, 2011:506-511
[14] NGUYEN H T, SMEULDERS A. Active learning using pre-clustering[C]//2004 Proceedings of the Twenty-first International Conference on Machine Learning. Banff:ICML, 2004:79
[15] HAJMOHAMMADI M S, IBRAHIM R, SELAMAT A, et al. Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabeled samples[J]. Information Sciences, 2015, 317(C):67-77.
[16] HASTIE T, TIBSHIRANI R, FRIEDMAN J H, et al. The elements of statistical learning, second edition:data mining, inference, and prediction[J]. Mathematical Intelligencer, 2009, 27(2):83-85.
[17] LI W, LI Y, WANG Y. Chinese microblog sentiment analysis based on sentiment features[C]//Asia-Pacific Web Conference. Suzhou:APWeb, 2016:385-388

[1] 韩勇, 宁连举, 郑小林, 林炜华, 孙中原. 基于社交信息和物品曝光度的矩阵分解推荐[J]. 浙江大学学报(工学版), 2019, 53(1): 89-98.
[2] 郑洲, 张学昌, 郑四鸣, 施岳定. 基于区域增长与统一化水平集的CT肝脏图像分割[J]. 浙江大学学报(工学版), 2018, 52(12): 2382-2396.
[3] 赵丽科, 郑顺义, 王晓南, 黄霞. 单目序列的刚体目标位姿测量[J]. 浙江大学学报(工学版), 2018, 52(12): 2372-2381.
[4] 何杰光, 彭志平, 崔得龙, 李启锐. 局部维度改进的教与学优化算法[J]. 浙江大学学报(工学版), 2018, 52(11): 2159-2170.
[5] 李志, 单洪, 马涛, 黄郡. 基于反向标签传播的移动终端用户群体发现[J]. 浙江大学学报(工学版), 2018, 52(11): 2171-2179.
[6] 王硕朋, 杨鹏, 孙昊. 听觉定位数据库构建过程优化[J]. 浙江大学学报(工学版), 2018, 52(10): 1973-1979.
[7] 魏小峰, 程承旗, 陈波, 王海岩. 基于独立边数的链码方法[J]. 浙江大学学报(工学版), 2018, 52(9): 1686-1693.
[8] 陈荣华, 王鹰汉, 卜佳俊, 于智, 高斐. 基于KNN算法与局部回归的网站无障碍采样评估[J]. 浙江大学学报(工学版), 2018, 52(9): 1702-1708.
[9] 张承志, 冯华君, 徐之海, 李奇, 陈跃庭. 图像噪声方差分段估计法[J]. 浙江大学学报(工学版), 2018, 52(9): 1804-1810.
[10] 刘洲洲, 李士宁, 李彬, 王皓, 张倩昀, 郑然. 基于弹性碰撞优化算法的传感云资源调度[J]. 浙江大学学报(工学版), 2018, 52(8): 1431-1443.
[11] 王勇超, 祝凯林, 吴奇轩, 鲁东明. 基于局部渲染的高精度模型自适应展示技术[J]. 浙江大学学报(工学版), 2018, 52(8): 1461-1466.
[12] 郑守国, 崔雁民, 王青, 杨飞, 程亮. 飞机装配现场数据采集平台设计[J]. 浙江大学学报(工学版), 2018, 52(8): 1526-1534.
[13] 毕晓君, 王朝. 基于超平面投影的高维多目标进化算法[J]. 浙江大学学报(工学版), 2018, 52(7): 1284-1293.
[14] 张廷蓉, 滕奇志, 李征骥, 卿粼波, 何小海. 岩心三维CT图像超分辨率重建[J]. 浙江大学学报(工学版), 2018, 52(7): 1294-1301.
[15] 于勇, 周阳, 曹鹏, 赵罡. 基于MBD模型的工序模型构建方法[J]. 浙江大学学报(工学版), 2018, 52(6): 1025-1034.