Please wait a minute...
Vis Inf  2017, Vol. 1 Issue (1): 40-47    DOI: 10.1016/j.visinf.2017.01.005
Yi Yanga,b, Quanming Yaoa, Huamin Qua
a Hong Kong University of Science and Technology, Hong Kong; 
b Lenovo Group Limited, Hong Kong
VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling
Yi Yanga,b, Quanming Yaoa, Huamin Qua#br#
1 Hong Kong University of Science and Technology, Hong Kong;
2 Lenovo Group Limited, Hong Kong
 全文: PDF 
摘要: 背景:在文本数据量日益增长的情况下,如何对具有庞大数据量的文本信息进行有效分析是一个极具挑战性的问题。近年来,可从海量文本数据中自动提取关键信息的文本挖掘技术得到了迅速发展。主题建模作为一种从文档中提取主题结构的新技术,广泛用来生成文本摘要和促进对全集内容的整体理解。尽管功能强大,但这种技术可能无法直接应用于一般性分析,这是因为主题以及主题-文档关系在模型中通常被表示为一定的概率。此外,在知识发现中起着重要作用的信息,如时间和作者,很难反映在用于综合分析的主题建模中。

创新:针对这一问题,本文提出了一个基于主题建模的可视分析系统VISTopic,以帮助用户更好地理解大型文本集。VISTopic首先使用一种新的分层隐树模型( HLTM )来提取一组层次主题。具体而言,设计了一种融入了模型特征的主题视图,来促进对主题层次结构的整体理解和交互探索。为了利用多视角信息进行可视分析,VISTopic还提供了一个演化视图来显示主题的演变情况,并提供了一个文本信息视图来显示主题文本的相关细节。

关键词: 主题建模文本可视化视觉分析    
Effective analysis of large text collections remains a challenging problem given the growing volume of available text data. Recently, text mining techniques have been rapidly developed for automatically extracting key information from massive text data. Topic modeling, as one of the novel techniques that extracts a thematic structure from documents, is widely used to generate text summarization and foster an overall understanding of the corpus content. Although powerful, this technique may not be directly applicable for general analytics scenarios since the topics and topic–document relationship are often presented probabilistically in models. Moreover, information that plays an important role in knowledge discovery, for example, times and authors, is hardly reflected in topic modeling for comprehensive analysis. In this paper, we address this issue by presenting a visual analytics system, VISTopic, to help users make sense of large document collections based on topic modeling. VISTopic first extracts a set of hierarchical topics using a novel hierarchical latent tree model (HLTM) (Liu et al., 2014). In specific, a topic view accounting for the model features is designed for overall understanding and interactive exploration of the topic organization. To leverage multi-perspective information for visual analytics, VISTopic further provides an evolution view to reveal the trend of topics and a document view to show details of topical documents. Three case studies based on the dataset of IEEE VIS conference demonstrate the effectiveness of our system in gaining insights from large document collections.
Key words: Topic-modeling    Text visualization    Visual analytics
出版日期: 2017-07-06
We would like to thank Professor Nevin L. Zhang and his Ph.D. student Peixian Chen from the department of computer science and engineering in HKUST for their kind technical support on topic modeling. In addition, we would like to thank Siwei Fu for preparing the pdf files of IEEE VIS corpus and discussion on data processing. This project is funded by a grant proposal (Ref: YBCB2009041-44) of Huawei Technologies Noah’s Ark Lab. We are grateful for Huawei’s generous support, as well as data acquisition. Finally, we thank the anonymous reviewers for their valuable feedback.
通讯作者: Yi Yang     E-mail:
E-mail Alert
Yi Yang
Quanming Yao
Huamin Qu


Yi Yang, Quanming Yao, Huamin Qu. VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling. Vis Inf, 2017, 1(1): 40-47.


[1] Jian-Syuan Wong, Xiaolong “Luke” Zhang. MessageLens:可对MOOC论坛讨论进行多方面探索的可视分析系统[J]. Vis Inf, 2018, 2(1): 37-49.
[2] Christina Gillmann, Robin G.C. Maack, Tobias Post, Thomas Wischgoll, Hans Hagen. 在知晓其不确定性的情况下运用分层图像语义进行锁孔手术规划的工作流程[J]. Vis Inf, 2018, 2(1): 26-36.
[3] . 用于优化超级计算系统中大规模网络性能的可视分析系统[J]. Vis Inf, 2018, 2(1): 98-110.