Multi-label news classification algorithm based on deep bi-directional classifier chains
Tian-lei HU1(),Hao-bo WANG1,Wen-dong YIN2
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China 2. School of Humanities, Zhejiang University, Hangzhou 310028, China
A deep neural network based bi-directional classifier chains algorithm was proposed for multi-label news classification tasks to deal with problems faced by traditional classifier chains method, i.e. hard to determine the order of label dependencies, the inefficiency of integrated models and incapable of using complicated base classifiers. In the proposed method, a forward classifier chain is utilized to obtain the correlation between each label and all previous labels, and a backward classifier chain is involved, starting from the output of the last base classifier in the forward classifier chain, to learn the correlations between each label and all other labels. The deep neural network is employed as a base classifier in order to explore the non-linear label correlation and improve the predictive performance. Br integrating the mean square loss of the two classifier chains, the objective function is optimized by stochastic gradient descent algorithm. The experimental results of the proposed method for multi-label news classification dataset RCV1-v2 were compared with those of current classifier chains methods and other multi-label algorithms. Results show that the deep bi-directional classifier chains can significantly improve the predictive performance.
Tian-lei HU,Hao-bo WANG,Wen-dong YIN. Multi-label news classification algorithm based on deep bi-directional classifier chains. Journal of ZheJiang University (Engineering Science), 2019, 53(11): 2110-2117.
Fig.3Experimental results of Precision@ $K$ for different methods
Fig.4Experimental results of learning rate sensitivity
[1]
MCCALLUM A, NIGAM K. A comparison of event models for naive Bayes text classification [C]// AAAI-98 Workshop on Learning for Text Categorization. Madison: AAAI, 1998, 752(1): 41-48.
[2]
DILRUKSHI I, DE Z K, CALDERA A. Twitter news classification using SVM [C]// International Conference on Computer Science and Education. Colombo: IEEE, 2013: 287-291.
[3]
KUMAR R B, KUMAR B S, PRASAD C S S Financial news classification using SVM[J]. International Journal of Scientific and Research Publications, 2012, 2 (3): 1- 6
[4]
SELAMAT A, OMATU S Web page feature selection and classification using neural networks[J]. Information Sciences, 2004, 158: 69- 88
doi: 10.1016/j.ins.2003.03.003
[5]
KIM Y. Convolutional neural networks for sentence classification [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: ACL, 2014: 1746-1751.
[6]
KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1725-1732.
[7]
WERMTER S Neural network agents for learning semantic text classification[J]. Information Retrieval, 2000, 3 (2): 87- 103
doi: 10.1023/A:1009942513170
[8]
BLEI D M, NG A Y, JORDAN M I Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993- 1022
[9]
涂鼎, 陈岭, 陈根才, 等 基于在线层次化非负矩阵分解的文本流主题检测[J]. 浙江大学学报: 工学版, 2016, 50 (8): 1618- 1626 TU Ding, CHEN Ling, CHEN Gen-cai, et al Hierarchical online NMF for detecting and tracking topic[J]. Journal of Zhejiang University: Engineering Science, 2016, 50 (8): 1618- 1626
[10]
林萌, 罗森林, 贾丛飞, 等 融合句义结构模型的微博话题摘要算法[J]. 浙江大学学报: 工学版, 2015, 49 (12): 2316- 2325 LIN Meng, LUO Sen-lin, JIA Cong-fei, et al Microblog topics summarization algorithm merging sentential structure model[J]. Journal of Zhejiang University: Engineering Science, 2015, 49 (12): 2316- 2325
[11]
HARRIS Z S Distributional structure[J]. Word, 1954, 10 (2/3): 146- 162
[12]
SALTON G, YU C T. On the construction of effective vocabularies for information retrieval [C]// ACM SIGIR Forum. Gaithersburg: ACM, 1973: 48-60.
[13]
BI W, KWOK J T. Multi-label classification on tree-and dag-structured hierarchies [C]// Proceedings of the 28th International Conference on Machine Learning. Bellevue: IMLS, 2011: 17-24.
[14]
ZHANG M L, ZHOU Z H ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40 (7): 2038- 2048
doi: 10.1016/j.patcog.2006.12.019
TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Mining multi-label data [M]// Data Mining and Knowledge Discovery Handbook. Boston: Springer, 2009: 667-685.
[17]
HSU D J, KAKADE S M, LANGFORD J, et al. Multi-label prediction via compressed sensing [C]// Advances in Neural Information Processing Systems. Vancouver: NIPS, 2009: 772-780.
[18]
YEH C K, WU W C, KO W J, et al. Learning deep latent space for multi-label classification [C]// AAAI. San Francisco: AAAI, 2017: 2838-2844.
[19]
READ J, PFAHRINGER B, HOLMES G, et al Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85 (3): 333
doi: 10.1007/s10994-011-5256-5
[20]
王少博, 李宇峰 用于多标记学习的分类器圈方法[J]. 软件学报, 2015, 26 (11): 2811- 2819 WANG Shao-bo, LI Yu-feng Classifier circle method for multi-label learning[J]. Journal of Software, 2015, 26 (11): 2811- 2819
[21]
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks [C]// Advances in Neural Information Processing Systems. Lake Tahoe: NIPS, 2012: 1097-1105.
[22]
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[23]
MCCULLOCH W S, PITTS W A logical calculus of the ideas immanent in nervous activity[J]. The Bulletin of Mathematical Biophysics, 1943, 5 (4): 115- 133
doi: 10.1007/BF02478259
[24]
ROSENBLATT F The perceptron: a probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65 (6): 386
doi: 10.1037/h0042519
[25]
LECUN Y, BOTTOU L, BENGIO Y, et al Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86 (11): 2278- 2324
doi: 10.1109/5.726791
[26]
MIKOLOV T, KARAFIáT M, BURGET L, et al. Recurrent neural network based language model [C]// 11th Annual Conference of the International Speech Communication Association. Florence: ISCA, 2011: 2877-2880.
[27]
LEWIS D D, YANG Y, ROSE T G, et al Rcv1: a new benchmark collection for text categorization research[J]. Journal of Machine Learning Research, 2004, 5: 361- 397
[28]
ABADI M, BARHAM P, CHEN J, et al. Tensorflow: a system for large-scale machine learning [C]// OSDI. Savannah: USENIX, 2016, 16: 265-283.