Please wait a minute...
浙江大学学报(理学版)  2022, Vol. 49 Issue (3): 271-279    DOI: 10.3785/j.issn.1008-9497.2022.03.002
数学与计算机科学     
表征学习驱动的多重网络图采样
虞瑞麒1(),刘玉华1(),沈禧龙2,翟如钰1,张翔2,周志光1
1.杭州电子科技大学 数字媒体与艺术设计学院,浙江 杭州 310018
2.浙江财经大学 信息管理与人工智能学院,浙江 杭州 310018
Representation learning driven multiple graph sampling
Ruiqi YU1(),Yuhua LIU1(),Xilong SHEN2,Ruyu ZHAI1,Xiang ZHANG2,Zhiguang ZHOU1
1.School of Media and Design,Hangzhou Dianzi University,Hangzhou 310018,China
2.School of Information Management and Artificial Intelligence,Zhejiang University of Finance and Economics,Hangzhou 310018,China
 全文: PDF(2051 KB)   HTML( 1 )
摘要:

已有的图采样方法侧重于单图采样,关注如何在一张图上通过采样保留其特定的拓扑结构特征。随着数据采集能力的提升,多重网络图在实际应用中越来越普遍,即相同的节点集在不同场景中具有不同的网络关系。针对传统图采样方法无法兼顾多重网络图结构特征的问题,提出了表征学习驱动的多重网络图采样算法。首先,设计融合多重网络图结构特征的图表征学习方法,将节点投影至二维的表征学习空间;其次,利用改进的自适应蓝噪声采样算法,考虑节点密度和网络连通性,从表征学习空间筛选节点,以保持其多重网络结构特征及图上下文结构特征。进而开发了一套多重网络图采样可视分析系统,支持用户交互式地探索多重网络图采样,并与已有采样算法进行对比。案例分析和评估实验证明了本文算法在多重网络图采样中的有效性。

关键词: 多重网络图图采样可视分析评估    
Abstract:

The existing graph sampling techniques pay attention mainly to single graph sampling, focusing on how to preserve the specific topological features of a graph by sampling, such as node degree, cluster coef?cient, connectivity. With the improvement of data acquisition capability, multiple graphs, namely a set of nodes exhibits different relationships in different scenarios, have become quietly ubiquitous in the real world. To address this problem, a multiple graph sampling driven by representation learning is proposed. First, a graph representation learning method is designed to fuse the structural features of multiple graphs, through which the nodes are projected into a two-dimensional representation learning space. Then, considering node density and network connectivity, an improved the adaptive blue noise sampling algorithm is employed to select nodes from the representation learning space meanwhile simultaneously preserving the contextual structure features of multiple graphs. Furthermore, an interactive visual analytics system is developed allowing users to explore and analyze multiple graph sampling, and visually compare results with various sampling strategies. Case studies and the experimental results based on two real-world datasets have demonstrated the effectiveness of the proposed method in sampling multiple graphs.

Key words: multiple network graphs    graph sampling    visual analysis    evaluation
收稿日期: 2022-01-30 出版日期: 2022-05-24
CLC:  TP 391.41  
基金资助: 国家自然科学基金资助项目(61802339);浙江大学CAD&CG国家重点实验室开放课题(A2224)
通讯作者: 刘玉华     E-mail: yrq0121@hdu.edu.cn;liuyuhua@hdu.edu.cn
作者简介: 虞瑞麒(2001—),ORCID: https://orcid.org/0000-0001-6116-6678,男,本科生,主要从事数据可视化研究,E-mail:yrq0121@hdu.edu.cn.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
虞瑞麒
刘玉华
沈禧龙
翟如钰
张翔
周志光

引用本文:

虞瑞麒,刘玉华,沈禧龙,翟如钰,张翔,周志光. 表征学习驱动的多重网络图采样[J]. 浙江大学学报(理学版), 2022, 49(3): 271-279.

Ruiqi YU,Yuhua LIU,Xilong SHEN,Ruyu ZHAI,Xiang ZHANG,Zhiguang ZHOU. Representation learning driven multiple graph sampling. Journal of Zhejiang University (Science Edition), 2022, 49(3): 271-279.

链接本文:

https://www.zjujournals.com/sci/CN/10.3785/j.issn.1008-9497.2022.03.002        https://www.zjujournals.com/sci/CN/Y2022/V49/I3/271

图1  多重网络图采样算法流程(a)原始多重网络图 (b)融合多重网络图结构特征的表征学习 (c)多目标的自适应蓝噪声采样 (d)多重网络图采样结果
图2  泊松盘优化
图3  考虑多重网络图连通性的泊松盘遍历
图4  多重网络图采样可视分析系统
分类指标说明
节点平均中介度(ABD)4衡量桥节点保留情况,值越大被保留的桥节点越多,表现越好
平均中心性(ACC)4衡量节点中心性保持情况,与原始图结果越一致越好
连接性平均最短路径长度(APL)1采样图和原始图的平均最短路径长度,越一致越好
连通性(CC)4统计连通分量的数量,值越大连通性越差
社区结构社区相似性(SC)1度量采样前后社区结构直方图的分布变化,值越大越好
社区结构稳定性(SSC)1使用互信息度量采样前后社区结构的变化,值越小越好
聚类局部聚类系数(LCC)1采样图和原始图的局部聚类系数,越一致越好
全局聚类系数(GCC)1采样图和原始图的全局聚类系数,越一致越好
表1  评估指标
采样算法ABD↑△ACC↓△APL↓CC↓SC↑SSC↓△LCC↓△GCC↓
Our0.046 9610.010 2212.297 1593.380 4710.777 9011.429 3520.005 1960.001 791
RN0.005 3180.285 9303.272 23035.159 0320.452 8751.517 4110.035 8910.003 874
RE0.037 7840.039 3122.441 6875.512 2650.684 8011.420 5470.019 4820.000 556
TIE0.036 4190.051 7112.493 2305.341 2040.682 7881.407 5310.035 1300.001 108
SRW0.030 4040.057 2992.601 0518.974 0270.626 0091.416 4230.018 5090.001 172
RJ0.007 1050.270 3223.252 55833.216 7030.443 5921.511 9810.035 8910.003 874
ISRW0.032 8460.044 7542.360 1065.980 3260.653 2821.405 5180.035 6560.000 194
表2  各采样算法在MOSS数据集中的评估结果
采样算法ABD↑△ACC↓△APL↓CC↓SC↑SSC↓△LCC↓△GCC↓
Our0.043 3510.083 7694.159 63510.173 9040.808 1341.428 0930.162 4310.022 517
RN0.016 1030.204 6424.346 19118.462 0140.675 3091.470 7930.212 9500.090 168
RE0.027 1930.158 6574.168 92013.559 3120.710 2731.453 1030.163 0880.050 793
TIE0.025 4150.176 5524.183 46714.560 2310.707 7071.454 8000.174 6520.012 149
SRW0.023 1100.163 1374.126 94713.170 0320.725 7361.452 1830.151 3350.026 816
RJ0.014 9130.201 2434.333 99118.131 7960.687 0251.471 1070.214 3030.065 831
ISRW0.025 3110.137 6404.149 39412.129 0120.684 7571.440 6370.095 2680.085 155
表3  各采样算法在CELE数据集中的评估结果
图5  节点接近中心性分布直方图对比(MOSS)
图6  在MOSS数据集中的采样结果可视化对比
图7  在CELE数据集中的采样结果可视化对比
1 ZHOU Z, SHI C, SHEN X, et al. Context-aware sampling of large networks via graph representation learning[J]. IEEE Transactions on Visualization and Computer Graphics, 2020, 27(2): 1709-1719. DOI:10.1109/TVCG.2020.3030440
doi: 10.1109/TVCG.2020.3030440
2 CHENG K. Sampling from large graphs with a reservoir[C]// Proceedings of the 2014 17th International Conference on Network-Based Information Systems. Los Alamitos: IEEE Computer Society Press, 2014: 347-354. DOI:10. 1109/NBiS.2014.25
doi: 10. 1109/NBiS.2014.25
3 MAGNANI M, ROSSI L. Formation of multiple networks[C]// International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction. Berlin/Heidelberg: Springer, 2013: 257-264. DOI:10.1007/978-3-642-37210-0_28
doi: 10.1007/978-3-642-37210-0_28
4 SUN R, ZHANG L, CHEN Z. A sample-based approximate ranking method for large graphs[C]// 2018 6th International Conference on Advanced Cloud and Big Data(CBD). Los Alamitos: IEEE Computer Society Press, 2018: 112-117. DOI:10. 1109/CBD.2018.00029
doi: 10. 1109/CBD.2018.00029
5 TÜRKOGLU D, TURK A. Edge-based wedge sampling to estimate triangle counts in very large graphs[C]// 2017 IEEE International Conference on Data Mining(ICDM). Los Alamitos: IEEE Computer Society Press, 2017: 455-464. DOI:10. 1109/ICDM.2017.55
doi: 10. 1109/ICDM.2017.55
6 YOON S H, KIM K N, HONG J, et al. A community-based sampling method using DPL for online social networks[J]. Information Sciences, 2015, 306: 53-69. DOI:10.1016/j.ins. 2015.02.014
doi: 10.1016/j.ins. 2015.02.014
7 HU J, HONG S H, CHEN J, et al. Connectivity-based spectral sampling for big complex network visualization[C]// International Conference on Complex Networks and Their Applications. Cham: Springer, 2020: 237-248. DOI:10.1007/978-3-030-65347-7_20
doi: 10.1007/978-3-030-65347-7_20
8 ZHOU Z, MENG L, TANG C, et al. Visual abstraction of large scale geospatial origin-destination movement data[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 25(1): 43-53. DOI:10.1109/TVCG.2018.2864503
doi: 10.1109/TVCG.2018.2864503
9 GUO D, ZHU X. Origin-destination flow data smoothing and mapping[J]. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2043-2052. DOI:10.1109/TVCG.2014. 2346271
doi: 10.1109/TVCG.2014. 2346271
10 ZHAO Y, JIANG H, CHEN Q, et al. Preserving minority structures in graph sampling[J]. IEEE Transactions on Visualization and Computer Graphics, 2020, 27(2): 1698-1708. DOI:10.1109/TVCG.2020.3030428
doi: 10.1109/TVCG.2020.3030428
11 MAIYA A S, BERGER-WOLF T Y. Benefits of bias: Towards better characterization of network sampling[C]// Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2011: 105-113. DOI:10.1145/2020408. 2020431
doi: 10.1145/2020408. 2020431
12 NGUYEN Q H, HONG S H, EADES P, et al. Proxy graph: Visual quality metrics of big graph sampling[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(6): 1600-1611. DOI:10.1109/TVCG.2017.2674999
doi: 10.1109/TVCG.2017.2674999
13 WU Y, CAO N, ARCHAMBAULT D, et al. Evaluation of graph sampling: A visualization perspective[J]. IEEE Transactions on Visualization and Computer Graphics, 2016, 23(1): 401-410. DOI:10.1109/TVCG.2016.2598867
doi: 10.1109/TVCG.2016.2598867
14 ZHANG F Y, ZHANG S, WONG P C, et al. A visual evaluation study of graph sampling techniques[J]. Electronic Imaging, 2017, 2017(1): 110-117. DOI:10.2352/ISSN.2470-1173.2017.1.VDA-394
doi: 10.2352/ISSN.2470-1173.2017.1.VDA-394
15 ZHANG D, YIN J, ZHU X, et al. Network representation learning: A survey[J]. IEEE Transactions on Big Data, 2020, 6(1): 3-28. DOI:10.1109/TBDATA.2018.2850013 .
doi: 10.1109/TBDATA.2018.2850013
16 PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online learning of social representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2014: 701-710. DOI:10.1145/2623330.2623732
doi: 10.1145/2623330.2623732
17 GROVER A, LESKOVEC J. Node2vec: Scalable feature learning for networks[C]// Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2016: 855-864. DOI:10.1145/2939672.2939754
doi: 10.1145/2939672.2939754
18 WANG Y, CHEN L, JO J, et al. Joint t-SNE for comparable projections of multiple high-dimensional datasets[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(1): 623-632. DOI:10.1109/TVCG.2021.3114765 .
doi: 10.1109/TVCG.2021.3114765
[1] 冯翠翠, 叶观琼, 曾江宁, 林颖, 朱静. 浙江省海洋生态文明建设绩效评估研究[J]. 浙江大学学报(理学版), 2021, 48(5): 584-591.
[2] 何健薇, 笪玲, 杨建春. 乡村民宿经营者情感劳动价值评估研究[J]. 浙江大学学报(理学版), 2021, 48(5): 629-641.
[3] 徐敏, 王科, 戴浩然, 罗晓博, 余炜伦, 陶煜波, 林海. 基于电子病历的乳腺癌群组与治疗方案可视分析[J]. 浙江大学学报(理学版), 2021, 48(4): 391-401.
[4] 杨颐, 李国清, 王健, 王海军, 翟翊辰, 黄卫星. 云端结合的书法大数据平台[J]. 浙江大学学报(理学版), 2020, 47(4): 397-407.
[5] 赵喆, 张天野, 黄彦浩, 郑文庭, 陈为. 面向仿真数据的电网运行方式可视分析[J]. 浙江大学学报(理学版), 2020, 47(1): 36-44.
[6] 汪建妹, 杨华, 曾银欢, 吴俐勤, 李锐, 袁玉伟, 钱鸣蓉. 超高效液相色谱-串联质谱法测定蜂蜜中抗生素和农药浓度[J]. 浙江大学学报(理学版), 2018, 45(3): 330-342.
[7] 葛丹东, 徐威, 高宁. 基于主客体原真性的古镇更新策略与机制研究——以浙江乾元古镇为例[J]. 浙江大学学报(理学版), 2018, 45(2): 251-260.
[8] 仲俊涛,米文宝, 樊新刚,宋永永. 宁夏限制开发生态区农村社区发展能力评估[J]. 浙江大学学报(理学版), 2016, 43(1): 1-7.
[9] 杨晓鸣 寿彩丽 . 图书质量评估定量分析探讨 [J]. 浙江大学学报(理学版), 1998, 25(4): 96-99.