Computer Technology and Control Engineering |
|
|
|
|
Multi-label news classification algorithm based on deep bi-directional classifier chains |
Tian-lei HU1( ),Hao-bo WANG1,Wen-dong YIN2 |
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China 2. School of Humanities, Zhejiang University, Hangzhou 310028, China |
|
|
Abstract A deep neural network based bi-directional classifier chains algorithm was proposed for multi-label news classification tasks to deal with problems faced by traditional classifier chains method, i.e. hard to determine the order of label dependencies, the inefficiency of integrated models and incapable of using complicated base classifiers. In the proposed method, a forward classifier chain is utilized to obtain the correlation between each label and all previous labels, and a backward classifier chain is involved, starting from the output of the last base classifier in the forward classifier chain, to learn the correlations between each label and all other labels. The deep neural network is employed as a base classifier in order to explore the non-linear label correlation and improve the predictive performance. Br integrating the mean square loss of the two classifier chains, the objective function is optimized by stochastic gradient descent algorithm. The experimental results of the proposed method for multi-label news classification dataset RCV1-v2 were compared with those of current classifier chains methods and other multi-label algorithms. Results show that the deep bi-directional classifier chains can significantly improve the predictive performance.
|
Received: 01 November 2018
Published: 21 November 2019
|
|
基于深度双向分类器链的多标签新闻分类算法
在多标签新闻分类问题中,针对传统分类器链算法难以确定标签依赖顺序、集成模型运行效率低和无法应用复杂模型作为基分类器的问题,提出基于深度神经网络的双向分类器链算法. 该方法利用正向分类器链获取每个标签和前面所有标签的依赖关系,引入逆向分类器链,从正向链最后一个基分类器的输出开始反向学习每个标签和所有其他标签的相关性. 为了提取非线性标签相关性和提高预测性能,使用深度神经网络作为基分类器. 结合2条分类器链的均方误差,使用随机梯度下降算法对目标函数进行有效优化. 在多标签新闻分类数据集RCV1-v2上,将所提算法与当前主流的分类器链算法和其他多标签分类算法进行对比和分析. 实验结果表明,利用深度双向分类器链算法能够有效提升预测性能.
关键词:
多标签,
新闻分类,
深度学习,
神经网络,
分类器链
|
|
[1] |
MCCALLUM A, NIGAM K. A comparison of event models for naive Bayes text classification [C]// AAAI-98 Workshop on Learning for Text Categorization. Madison: AAAI, 1998, 752(1): 41-48.
|
|
|
[2] |
DILRUKSHI I, DE Z K, CALDERA A. Twitter news classification using SVM [C]// International Conference on Computer Science and Education. Colombo: IEEE, 2013: 287-291.
|
|
|
[3] |
KUMAR R B, KUMAR B S, PRASAD C S S Financial news classification using SVM[J]. International Journal of Scientific and Research Publications, 2012, 2 (3): 1- 6
|
|
|
[4] |
SELAMAT A, OMATU S Web page feature selection and classification using neural networks[J]. Information Sciences, 2004, 158: 69- 88
doi: 10.1016/j.ins.2003.03.003
|
|
|
[5] |
KIM Y. Convolutional neural networks for sentence classification [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: ACL, 2014: 1746-1751.
|
|
|
[6] |
KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1725-1732.
|
|
|
[7] |
WERMTER S Neural network agents for learning semantic text classification[J]. Information Retrieval, 2000, 3 (2): 87- 103
doi: 10.1023/A:1009942513170
|
|
|
[8] |
BLEI D M, NG A Y, JORDAN M I Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993- 1022
|
|
|
[9] |
涂鼎, 陈岭, 陈根才, 等 基于在线层次化非负矩阵分解的文本流主题检测[J]. 浙江大学学报: 工学版, 2016, 50 (8): 1618- 1626 TU Ding, CHEN Ling, CHEN Gen-cai, et al Hierarchical online NMF for detecting and tracking topic[J]. Journal of Zhejiang University: Engineering Science, 2016, 50 (8): 1618- 1626
|
|
|
[10] |
林萌, 罗森林, 贾丛飞, 等 融合句义结构模型的微博话题摘要算法[J]. 浙江大学学报: 工学版, 2015, 49 (12): 2316- 2325 LIN Meng, LUO Sen-lin, JIA Cong-fei, et al Microblog topics summarization algorithm merging sentential structure model[J]. Journal of Zhejiang University: Engineering Science, 2015, 49 (12): 2316- 2325
|
|
|
[11] |
HARRIS Z S Distributional structure[J]. Word, 1954, 10 (2/3): 146- 162
|
|
|
[12] |
SALTON G, YU C T. On the construction of effective vocabularies for information retrieval [C]// ACM SIGIR Forum. Gaithersburg: ACM, 1973: 48-60.
|
|
|
[13] |
BI W, KWOK J T. Multi-label classification on tree-and dag-structured hierarchies [C]// Proceedings of the 28th International Conference on Machine Learning. Bellevue: IMLS, 2011: 17-24.
|
|
|
[14] |
ZHANG M L, ZHOU Z H ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40 (7): 2038- 2048
doi: 10.1016/j.patcog.2006.12.019
|
|
|
[15] |
BRINKER K, HüLLERMEIER E. Case-based multilabel ranking [C]// IJCAI. Hyderabad: IJCAI, 2007: 702-707.
|
|
|
[16] |
TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Mining multi-label data [M]// Data Mining and Knowledge Discovery Handbook. Boston: Springer, 2009: 667-685.
|
|
|
[17] |
HSU D J, KAKADE S M, LANGFORD J, et al. Multi-label prediction via compressed sensing [C]// Advances in Neural Information Processing Systems. Vancouver: NIPS, 2009: 772-780.
|
|
|
[18] |
YEH C K, WU W C, KO W J, et al. Learning deep latent space for multi-label classification [C]// AAAI. San Francisco: AAAI, 2017: 2838-2844.
|
|
|
[19] |
READ J, PFAHRINGER B, HOLMES G, et al Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85 (3): 333
doi: 10.1007/s10994-011-5256-5
|
|
|
[20] |
王少博, 李宇峰 用于多标记学习的分类器圈方法[J]. 软件学报, 2015, 26 (11): 2811- 2819 WANG Shao-bo, LI Yu-feng Classifier circle method for multi-label learning[J]. Journal of Software, 2015, 26 (11): 2811- 2819
|
|
|
[21] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks [C]// Advances in Neural Information Processing Systems. Lake Tahoe: NIPS, 2012: 1097-1105.
|
|
|
[22] |
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
|
|
|
[23] |
MCCULLOCH W S, PITTS W A logical calculus of the ideas immanent in nervous activity[J]. The Bulletin of Mathematical Biophysics, 1943, 5 (4): 115- 133
doi: 10.1007/BF02478259
|
|
|
[24] |
ROSENBLATT F The perceptron: a probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65 (6): 386
doi: 10.1037/h0042519
|
|
|
[25] |
LECUN Y, BOTTOU L, BENGIO Y, et al Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86 (11): 2278- 2324
doi: 10.1109/5.726791
|
|
|
[26] |
MIKOLOV T, KARAFIáT M, BURGET L, et al. Recurrent neural network based language model [C]// 11th Annual Conference of the International Speech Communication Association. Florence: ISCA, 2011: 2877-2880.
|
|
|
[27] |
LEWIS D D, YANG Y, ROSE T G, et al Rcv1: a new benchmark collection for text categorization research[J]. Journal of Machine Learning Research, 2004, 5: 361- 397
|
|
|
[28] |
ABADI M, BARHAM P, CHEN J, et al. Tensorflow: a system for large-scale machine learning [C]// OSDI. Savannah: USENIX, 2016, 16: 265-283.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|