Please wait a minute...
Vis Inf  2020, Vol. 4 Issue (1): 23-31    DOI: 10.1016/j.visinf.2019.12.003
论文     
改进符号数据可视化以促进模式识别和知识发现
Kadri Umbleja, Manabu Ichino, Hiroyuki Yaguchi
School of Science and Engineering, Tokyo Denki University, Hatoyama, Saitama 350-0394, Japan
Improving symbolic data visualization for pattern recognition and knowledge discovery
Kadri Umbleja, Manabu Ichino, Hiroyuki Yaguchi
School of Science and Engineering, Tokyo Denki University, Hatoyama, Saitama 350-0394, Japan
 全文: PDF 
摘要: 本文研究符号数据的可视化,并分析其复杂结构带来的挑战。符号数据通常是从大型数据集聚合而来,用于隐藏条目特定的细节,并将大量数据(如大数据)转换成可分析量。在总体趋势比个别细节更重要的地方它可用来提供总览。符号数据有多种形式,如区间、直方图、类别和模态多值对象。符号数据也可以认为是一种分布。目前,实际使用的符号数据可视化方法是zoomstars,它有许多局限性。最大的限制是因为需要另一维度的数据,默认分布(直方图)在2D内不受支持。 本文提出了对zoomstars的几种改进,使其能够通过分位数或等价的区间方法实现2D内直方图的可视化。此外,还提出了对分类变量和模态变量的几项改进,使之能更清楚地展现所呈现的类别。 根据数据类型和期望的目标,本文为用户提供了基于zoomstars的不同可视化方案。此外,提出了一种形状编码的方法,,可在综合的类似表格的图中可视化整个数据集。这些可视化方法及其可用性通过三个符号数据集进行了验证,这三个数据集在探索性数据挖掘阶段分别用来识别趋势、相似对象和重要特征,检测数据中的异常值和差异。
关键词: 数据可视化符号数据Zoomstar形状编码探索性数据分析    
Abstract: This paper examines the visualization of symbolic data and considers the challenges rising from its complex structure. Symbolic data is usually aggregated from large data sets and used to hide entry specific details and to transform huge amounts of data (like big data) into analyzable quantities. It is also used to offer an overview in places where general trends are more important than individual details. Symbolic data comes in many forms like intervals, histograms, categories and modal multi-valued objects. Symbolic data can also be considered as a distribution. Currently, the de facto visualization approach for symbolic data is zoomstars which has many limitations. The biggest limitation is that the default distributions (histograms) are not supported in 2D as additional dimension is required. This paper proposes several new improvements for zoomstars which would enable it to visualize histograms in 2D by using a quantile or an equivalent interval approach. In addition, several improvements for categorical and modal variables are proposed for a clearer indication of presented categories. Recommendations for different approaches to zoomstars are offered depending on the data type and the desired goal. Furthermore, an alternative approach that allows visualizing the whole data set in comprehensive table-like graph, called shape encoding, is proposed. These visualizations and their usefulness are verified with three symbolic data sets in exploratory data mining phase to identify trends, similar objects and important features, detecting outliers and discrepancies in the data.
Key words: Data visualization    Symbolic data    Zoomstar    Shape encoding    Exploratory data analysis
出版日期: 2020-01-14
通讯作者: Kadri Umbleja     E-mail: kadriumbleja@gmail.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Kadri Umbleja
Manabu Ichino
Hiroyuki Yaguchi

引用本文:

Kadri Umbleja, Manabu Ichino, Hiroyuki Yaguchi. Improving symbolic data visualization for pattern recognition and knowledge discovery. Vis Inf, 2020, 4(1): 23-31.

链接本文:

http://www.zjujournals.com/vi/CN/10.1016/j.visinf.2019.12.003        http://www.zjujournals.com/vi/CN/Y2020/V4/I1/23

[1] Wenbin He, Junpeng Wang, Hanqi Guo, Han-Wei Shen, Tom Peterka. CECAV-DNN:使用深度神经网络进行集合比较和可视化 [J]. Vis Inf, 2020, 4(2): 109-121.
[2] Kecheng Lu, Chaoli Wang, Keqin Wu, Minglun Gong, Yunhai Wang. 基于块对应的时变体数据挖掘的统一框架 [J]. Vis Inf, 2019, 3(4): 157-165.
[3] Xueyi Chen, Liming Shen, Ziqi Sha, Richen Liu, Siming Chen, Genlin Ji, ChaoTan. 时空模拟数据可视化的多空间分析技术综述 [J]. Vis Inf, 2019, 3(3): 129-139.
[4] Aindrila Ghosh, Mona Nashaat, James Miller, Shaikh Quader, Chad Marston. 面向表格式工业数据集的探索性分析工具综述[J]. Vis Inf, 2018, 2(4): 235-253.
[5] Maha El Meseery, Orland Hoeber. 地理协同平行坐标(GCPC):环境数据分析的现场试验研究 [J]. Vis Inf, 2018, 2(2): 111-124.
[6] Natalia Andrienko, Gennady Andrienko, Elena Camossi, Christophe Claramunt, Jose Manuel Cordero Garcia, Georg Fuchs, Melita Hadzagic, Anne-Laure Jousselme, Cyril Ray, David Scarlatti, George Vouros. 采用交互时间掩码探索运动和事件数据[J]. Vis Inf, 2017, 1(1): 25-39.