Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
Computer Technology     
Hierarchical online NMF for detecting and tracking topics
TU Ding, CHEN Ling, CHEN Gen cai, WU Yong, WANG Jing chang
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. Zhejiang Hongcheng Computer Systems Company Limited, Hangzhou 310009, China
Download:   PDF(1228KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An online hierarchical non-negative matrix fraction method was proposed to address the problem of flat topic structure of text stream topic detecting methods. At every time slot, the topic nodes were oplited-splited according to the normalized discounted cumulative gain and a topic hierarchy was built by iteratively assigning documents to the most related topic nodes. The hierarchy construction process refers the previous topic hierarchy. The underlying assumption is that the topics are evolving among the similar topic nodes in different time slots. The method can detect and track topics in stream in an online way, which reveals many useful relationships between the topics. Experiments on Nist TDT2 dataset show that our method outperforms the contrasting methods under different metrics, e.g. NMI, Micro F1, MAP and NDCG, and uses less execution time.



Published: 01 August 2016
CLC:  TP 311  
Cite this article:

TU Ding, CHEN Ling, CHEN Gen cai, WU Yong, WANG Jing chang. Hierarchical online NMF for detecting and tracking topics. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(8): 1618-1626.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2016.08.026     OR     http://www.zjujournals.com/eng/Y2016/V50/I8/1618


基于在线层次化非负矩阵分解的文本流主题检测

针对文本流主题检测中存在的主题结构扁平问题,提出在线的层次化非负矩阵分解方法,在每个时间片中根据归一化累计折损增益选择主题节点进行分解,接着反复将文档分配给最相关的主题节点构建主题层次,该过程中假设主题在由不同时间片中相似主题节点构成的序列中连续再演化,在当前时间片对主题节点进行分解时考虑过去时间片中主题节点的分解结果.该方法不仅能在线的发现和更新文本流中的主题,而且还可揭示主题间的结构关系.在Nist TDT2数据集上的实验结果表明,该方法在NMI、Micro F1、MAP和NDCG等指标下均显著超过了其他动态NMF方法,并在时间效率上显示出一定优势.

[1] BLEI D M, JOHN D L. Dynamic topic models [C]∥Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania: ACM, 2006: 113-120.
[2] WANG C, BLEI D M, HECKERMAN D. Continuous time dynamic topic models [C]∥Uncertainty in Artificial Intelligence. Helsinki, Finland: AUAI Press, 2008:579-586.
[3] WANG X, ANDREW M. Topics over time: a nonmarkov continuoustime model of topical trends [C]∥Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA: ACM, 2006: 424-433.
[4] MAIRAL J, FRANCIS B, JEAN P, et al. Online learning for matrix factorization and sparse coding [J]. Journal of Machine Learning Research. 2010, 11: 19-60.
[5] SAHA A, VIKAS S. Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization [C]∥Proceedings of the Fifth ACM International Conference on Web Search and Data mining. Seattle, Washington: ACM, 2012: 693-702.
[6] VACA C K, AMIN M, ALEJANDRO J, MARCO S. A Timebased collective factorization for topic discovery and monitoring in news [C]∥Proceedings of the 23rd International Conference on World Wide Web. Seoul: ACM, 2014: 527-538.

 

 

 

 


[7] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[8] THOMAS H. Probabilistic latent semantic analysis [C]∥Uncertainty in artificial intelligence. Stockholm, Sweden: Morgan Kaufmann, 1999: 289-296.
[9] GAUSSIER E, GOUTTE C. Relation between PLSA and NMF and implications [C]∥Acm Sigir Conference on Research & Development in Information Retrieval. Salvador, Bahia, Brazil: ACM, 2005: 601-602.
[10] KUANG D, HAESUN P. Fast rank2 nonnegative matrix factorization for hierarchical document clustering [C]∥Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. Chicago, Illinois: ACM,2013: 739-747.
[11] JENATTON R, MAIRAL J, OBOZINSKI G, et al.. Proximal methods for sparse hierarchical dictionary learning [C]∥International Conference on Machine Learning. Haifa, Israel: Omnipress, 2010: 487-494.
[12] 景丽萍,朱岩,于剑.层次非负矩阵分解及在文本聚类中的应用[J].计算机科学与探索,2011, 5(10): 904-913.
JING Liping, ZHU Yan, YU jian. Hierarchical nonnegative matrix factorization for text clustering [J]. Journal of Frontiers of Computer Science & Technology, 2011, 5(10): 904-913.
[13] BLEI D M, GRIFFITHS T L, JORDAN M I, et al. Hierarchical topic models and the nested chinese restaurant process [C]∥Proceedings of the 17th Annual Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: MIT Press, 2003: 17-24.
[14] HU L, LI JZ, ZHANG J, et al. oHETM: an online hierarchical entity topic model for news streams [C]∥ Proceedings of the 19th PacificAsia Conference on Knowledge Discovery and Data Mining. Ho Chi Minh City, Vietnam: Springer, 2015: 696-707.
[15] CAI D, MEI Q, HAN J, et al. Modeling hidden topics on document manifold [C]∥Proceedings of the 17th ACM conference on Information and knowledge management. Napa Valley, California: ACM, 2008: 911-920.
[16] CICHOCKI A, ZDUNEK R, PHAN A H, et al. Nonnegative matrix and tensor factorizations: applications to exploratory multiway data analysis and blind source separation [M]. Hoboken, New Jersey. John Wiley & Sons, 2009: 131-139.
[17] RITOV Y, BICKEL P J, TSYBAKOV A B. Simultaneous analysis of lasso and dantzig selector [J]. Annals of Statistics, 2009, 37(4):1705-1732.
[1] YUAN You-wei-, YU Jia, ZHENG Hong-sheng, WANG Jiao-jiao. Cloud workflow scheduling algorithm based on novelty ranking and multi-quality of service[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1190-1196.
[2] XU Rong-bin, SHI Jun, ZHANG Peng-fei, XIE Ying. Similarity measurement of transition mapping relation using Petri net[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1205-1213.
[3] WANG Haiyan, CHENG Yan . Dual service selection method based on coefficient of variation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1197-1204.
[4] CHANG Chao, LIU Ke-sheng, TAN Long-dan, JIA Wen-chao. Data flow analysis for C program based on graph model[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 1007-1015.
[5] WANG Ji kui . Bayesian conflicting Web data credibility algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(12): 2380-2385.
[6] YANG Sha, YE Zhen yu, WANG Shu gang, TAO Hai, LI Shi jian. Perception enhanced intelligent robotic arm system[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(6): 1155-1159.
[7] LUO Lin, SU Hong ye, BAN Lan. Nonparametric bayesian based on  mixture of dirichlet process in application of fault detection[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(11): 2230-2236.
[8] WANG Hong-hao, WANG Hui-quan, JIN Zhong-he. Rollback-able on-board software upgrade method based on incremental link[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(4): 724-731.
[9] WANG Ji-kui, LI Shao-bo. Quality evaluation algorithm for conflicting data sources based on true value finding[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(2): 303-318.
[10] CAI Hua-lin, CHEN Gang, CHEN Ke. Spatial matching on multi-type resource[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(1): 69-78.
[11] YU Dong-jin, YIN Yu-yu, WU Meng-meng, LIU Yu. QoS prediction for Web services based on hybrid collaborative filtering[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(11): 2039-2045.
[12] KE Hai-feng, YING Jing. Real-time license character recognition technology based on R-ELM[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(7): 1209-1216.
[13] LIU Zhi-hui, ZHANG Quan-ling. Research overview of big data technology[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(6): 957-972.
[14] TIAN Tian,GONG Dun-wei. Evolutionary generation of test data for path coverage through selecting target paths based on coverage difficulty[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(5): 948-994.
[15] KE Hai-feng, YING Jing. Real-time license character recognition technology based on R-ELM[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(2): 0-0.