Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2015, Vol. 16 Issue (6): 457-465    DOI: 10.1631/FITEE.1400352
    
大规模文本数据的主题建模
Xi-ming Li, Ji-hong Ouyang, You Lu
College of Computer Science and Technology, Jilin University, Changchun 130012, China; MOE Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Changchun 130012, China
Topic modeling for large-scale text data
Xi-ming Li, Ji-hong Ouyang, You Lu
College of Computer Science and Technology, Jilin University, Changchun 130012, China; MOE Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Changchun 130012, China
 全文: PDF 
摘要: 目的:研究大规模数据的主题模型在线推理算法,针对随机变分推理算法中随机梯度误差较大的问题,提出一种移动平均随机变分推理算法。
创新点:使用多次迭代的随机梯度移动平均值近似代替真实随机梯度,以此减小随机梯度和真实梯度间的误差。
方法:以主题模型的基础模型潜在狄利克雷分配为载体展开研究。考虑不同次迭代的文本子集具有不同的词汇(表1),使用不同次迭代的随机项移动平均值近似代替真实随机梯度的随机项。为尽可能保证算法的精度,使用最近R次迭代的随机项(图2)并验证所提算法的收敛性。
结论:在随机变分推理算法基础上,提出一种移动平均随机变分推理算法,实现更好的文本主题建模效果和更快的收敛速度。
关键词: 潜在狄利克雷分配主题模型在线学习移动平均值    
Abstract: This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named ‘stochastic variational inference’ and ‘SGRLD’, our algorithm achieves a faster convergence rate and better performance.
Key words: Latent Dirichlet allocation (LDA)    Topic modeling    Online learning    Moving average
收稿日期: 2014-10-15 出版日期: 2015-06-04
CLC:  TP391.1  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Xi-ming Li
Ji-hong Ouyang
You Lu

引用本文:

Xi-ming Li, Ji-hong Ouyang, You Lu. Topic modeling for large-scale text data. Front. Inform. Technol. Electron. Eng., 2015, 16(6): 457-465.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/FITEE.1400352        http://www.zjujournals.com/xueshu/fitee/CN/Y2015/V16/I6/457

[1] Jin-song Su, Xiao-dong Shi, Yan-zhou Huang, Yang Liu, Qing-qiang Wu, Yi-dong Chen, Huai-lin Dong. 主题敏感的枢轴语言统计机器翻译[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(4): 241-253.