Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (9): 1856-1864    DOI: 10.3785/j.issn.1008-973X.2023.09.017
计算机技术     
基于GPU的子图匹配优化技术
李安腾1(),崔鹏杰2,袁野1,*(),王国仁1
1. 北京理工大学 计算机学院,北京 100081
2. 东北大学 计算机科学与工程学院,辽宁 沈阳 110169
Research on subgraph matching optimization based on GPU
An-teng LI1(),Peng-jie CUI2,Ye YUAN1,*(),Guo-ren WANG1
1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
2. School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
 全文: PDF(1352 KB)   HTML
摘要:

提出高效的基于图形处理器(GPU)的子图匹配算法GpSI,针对主流算法的过滤阶段和连接阶段分别设计优化方案. 提出基于复合签名的过滤算法,在过滤阶段利用结点所处局部的数量特征和结构特征提升候选集过滤能力. 采用基于候选点的连接策略,在连接阶段以最小邻居数为粒度预分配空间,设计高效的集合运算,避免传统方法重复连接的额外开销. 多个数据集测试结果表明GpSI较主流GPU子图匹配算法在候选集过滤能力、执行用时、GPU内存占用和稳定性上均有明显优势. 在真实数据集测试中,相比GPU友好子图匹配算法,GpSI的执行用时加速2~10倍.

关键词: 子图同构数据挖掘图形处理器(GPU)并行计算高性能计算    
Abstract:

An efficient graphic processing unit (GPU)-based subgraph matching algorithm GpSI was proposed, and the optimization schemes were designed for the filtering phase and the joining phase of the mainstream algorithms. In the filtering phase, a filtering algorithm based on the composite signatures was proposed, and the local quantitative and structural features of the vertices were used to improve the filtering ability of the candidate sets. In the joining phase, a joining strategy based on candidate vertices was adopted. The space was pre-allocated at the granularity of the minimum number of neighbors, an efficient set operations was designed to realize the joining, and the extra overhead caused by the repeated joins in the traditional method was avoided. Experimental results of multiple datasets show that GpSI has obvious advantages in the filtering ability of the candidate set, the execution time, the GPU memory usage and the stability compared with the mainstream GPU subgraph matching algorithms. In a real data set experiment, the execution time of GpSI was 2 to 10 times faster compared to the execution time of GPU-friendly subgraph isomorphism algorithm.

Key words: subgraph isomorphism    data mining    graphic processing unit (GPU)    parallel computing    high performance computing
收稿日期: 2022-10-31 出版日期: 2023-10-16
CLC:  TP 399  
基金资助: 国家自然科学基金资助项目(61932004, 61732003); 中央高校基本科研基金资助项目(N181605012)
通讯作者: 袁野     E-mail: lee_anteng@qq.com;yuan-ye@bit.edu.cn
作者简介: 李安腾(1998—),男,硕士生,从事GPU图数据管理研究. orcid.org/0000-0002-8763-5441. E-mail: lee_anteng@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
李安腾
崔鹏杰
袁野
王国仁

引用本文:

李安腾,崔鹏杰,袁野,王国仁. 基于GPU的子图匹配优化技术[J]. 浙江大学学报(工学版), 2023, 57(9): 1856-1864.

An-teng LI,Peng-jie CUI,Ye YUAN,Guo-ren WANG. Research on subgraph matching optimization based on GPU. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1856-1864.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.09.017        https://www.zjujournals.com/eng/CN/Y2023/V57/I9/1856

图 1  子图匹配示例
图 2  GpSI的框架
图 3  数量签名
图 4  结点的局部路径
图 5  结构签名
图 6  复合签名
图 7  生成新匹配表
图 8  交集运算
数据集 NV NE MD 是否幂律图
facebook 4 039 88 234 1 045
enron 36 692 183 831 1 383
gowalla 196 591 950 327 14 730
dblp 317 080 1 049 866 343
road 1 379 917 1 921 660 12
表 1  子图匹配实验的数据集信息
图 9  查询图
数据集 $\min {\text{|}}C(u){\text{|}}$ t/ms
GpSI GSI GpSI GSI
facebook 647 657 3 4
enron 2211 2429 4 5
gowalla 9299 14197 6 7
dblp 19475 23026 7 11
road 3869 47973 17 22
表 2  2种算法在不同数据集上的过滤能力测试结果
图 10  算法的执行用时对比
图 11  算法的图形处理器内存占用对比
图 12  3种算法的稳定性测试
1 WANG J, CHENG J. Truss decomposition in massive networks [C]// proceedings of the VLDB Endowment. [S.l.]: VLDB Endowment, 2012, 5(9): 812–823.
2 KAIRAM S R, WANG D J, LESKOVEC J. The life and death of online groups: predicting group growth and longevity [C]// Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. [S.l.]: ACM, 2012: 673–682.
3 PÉREZ J, ARENAS M, GUTIERREZ C. Semantics and complexity of sparql [C]// ACM Transactions on Database Systems, 2009, 16: 1-45.
4 CHRISTOPHIDES V. Resource description framework (RDF) schema (RDFS) [M]// LIU L, ÖZSU M T. Encyclopedia of database systems. Boston: Springer, 2009: 2425-2428.
5 ZENG L, ZOU L Redesign of the gstore system[J]. Frontiers Computer Science, 2018, 12: 623- 641
doi: 10.1007/s11704-018-7212-z
6 YAN X F, YU P S, HAN J W. Graph indexing: a frequent structure-based approach [C]// Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. [S.l.]: ACM, 2004: 335-346.
7 LIU H B, BLOUIN C, KEŠELJ V. Biological event extraction using subgraph matching [EB/OL]. [2022-06-01]. https://ceur-ws.org/Vol-714/ShortPaper03_Liu.pdf.
8 GAREY M R, JOHNSON D S. Computers and intractability: a guide to the theory of NP-completeness [M]. [S.l.]: W. H. Freeman and Company, 1979.
9 CONTE D, FOGGIA P, SANSONE C, et al Thirty years of graph matching in pattern recognition[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2004, 18 (3): 265- 298
doi: 10.1142/S0218001404003228
10 LIN H, ZHU X W, YU B W, et al. ShenTu: processing multi-trillion edge graphs on millions of cores in seconds [C]// SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas: IEEE, 2018: 706-716.
11 MENG K, LI J J, TAN G M, et al. A pattern based algorithmic autotuner for graph processing on GPUs [C]// Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. [S.l.]: ACM, 2019: 201-213.
12 KHORASANI F, VORA K, GUPTA R, et al. CuSha: vertex-centric graph processing on GPUs [C]// Proceeding of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. [S.l.]: ACM, 2014: 239-252.
13 ULLMANN J R. An algorithm for subgraph isomorphism [J] Journal of the ACM, 1976, 23(1): 31-42.
14 CORDELLA L P, FOGGIA P, SANSONE C, et al A (sub)graph isomorphism algorithm for matching large graphs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26 (10): 1367- 1372
doi: 10.1109/TPAMI.2004.75
15 CARLETTI V, FOGGIA P, SAGGESE A, et al Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 804- 818
doi: 10.1109/TPAMI.2017.2696940
16 HAN W S, LEE J, LEE J H. TurboISO: towards ultrafast and robust subgraph isomorphism search in large graph databases [C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. [S.l.]: ACM, 2013: 337-348.
17 HE H, SINGH A K. Graphs-at-a-time: query language and access methods for graph databases [C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. [S.l.]: ACM, 2008: 405-418.
18 SUN Z, WANG H Z, WANG H X, et al. Efficient subgraph matching on billion node graphs [C]// Proceedings of the VLDB Endowment. [S.l.]: VLDB Endowment, 2012, 5(9): 788-799.
19 ZHANG S J, LI S R, YANG J. GADDI: distance index based subgraph matching in biological networks [C]// Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. [S.l.]: ACM, 2009: 192-203.
20 LEE J, HAN W S, KASPEROVICS R, et al. An in-depth comparison of subgraph isomorphism algorithms in graph databases [C]// Proceedings of the VLDB Endowment. [S.l.]: VLDB Endowment, 2012, 6(2): 133-144.
21 徐周波, 李珍, 刘华东, 等 基于邻居信息聚合的子图同构匹配算法[J]. 计算机应用, 2021, 41 (1): 43- 47
XU Zhou-bo, LI Zhen, LIU Hua-dong, et al Subgraph isomorphism matching algorithm based on neighbor information aggregation[J]. Journal of Computer Applications, 2021, 41 (1): 43- 47
22 MOORMAN J D, CHEN Q Y, TU T K, et al. Filtering methods for subgraph matching on multiplex networks [C]// Proceedings of the 2018 IEEE International Conference on Big Data. Seattle: IEEE, 2018: 3980-3985.
23 TRAN H N, KIM J J, HE B S. Fast subgraph matching on large graphs using graphics processors [C]// Proceedings of the 2015 International Conference on Database Systems for Advanced Applications. [S.l.]: Springer, 2015: 299-315.
24 WANG L, OWENS J D. Fast Gunrock subgraph matching (GSM) on GPUs [EB/OL]. [2022-06-01]. https://arxiv.org/pdf/2003.01527.pdf.
25 WANG Y Z H, DAVIDSON A, PAN Y C, et al. Gunrock: a high-performance graph processing library on the GPU [C]// Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. [S.l.]: ACM, 2015: 265-266
26 ZENG L, ZOU L, ÖZSU M T, et al. GSI: GPU-friendly subgraph isomorphism [C]// Proceedings of the 2020 IEEE 36th International Conference on Data Engineering. Dallas: IEEE, 2020: 1249-1260.
27 BONNICI V, GIUGNO R, BOMBIERI N. An efficient implementation of a subgraph isomorphism algorithm for GPUs [C]// Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine. Madrid: IEEE, 2018: 2674-2681.
28 汪洋, 江世杰, 曹宇聪, 等 多编码树GPU并行轴心子图匹配[J]. 计算机应用, 2022, 42 (1): 132- 139
WAGN Yang, JIANG Shi-jie, CAO Yu-cong, et al Parallel pivoted subgraph matching with multiple coding trees on GPU[J]. Journal of Computer Applications, 2022, 42 (1): 132- 139
29 KIRK D. NVIDIA CUDA software and GPU parallel computing architecture [C]// Proceedings of the 6th International Symposium on Memory Management. [S.l.]: ACM, 2007: 103–104.
30 NVIDIA: CUDA Toolkit documentation v10.2 [CP/OL]. (2019-11-28)[2022-06-01]. https://docs.nvidia.com/cuda/archive/10.2/.
31 ASHARI A, SEDAGHATI N, EISENLOHR J, et al. Fast sparse matrix-vector multiplication on GPUs for graph applications [C]// Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. New Orleans: IEEE, 2014: 781-792.
32 李岑浩, 崔鹏杰, 袁野, 等 面向大图子图匹配的多GPU编程模型[J]. 计算机科学与探索, 2023, 17 (7): 1576- 1585
LI Cen-hao, CUI Peng-jie, YUAN Ye, et al Muti-GPU programming model for subgraph matching in large graphs[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17 (7): 1576- 1585
33 LESKOVEC J. Stanford large network dataset collection [DS/OL]. (2021-09-17)[2022-06-01]. http://snap.stanford.edu/data/.
34 REZA T, RIPEANU M, TRIPOUL N, et al. PruneJuice: pruning trillion-edge graphs to a precise pattern-matching solution [C]// Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. Dallas: IEEE, 2018: 1-17.
35 Boost Library: VF2 (Sub)Graph Isomorphism-master [CP/OL]. (2013-04-26) [2022-06-01]. https://www.boost.org/doc/libs/master/libs/graph.
[1] 李根,翟伟,邬岚. 基于梯度提升决策树的汇合交互作用研究[J]. 浙江大学学报(工学版), 2022, 56(4): 649-655.
[2] 林鹤翔,乔连鹏,袁野,王国仁. 基于GPU的大图数据上的关键字检索算法[J]. 浙江大学学报(工学版), 2022, 56(2): 271-279.
[3] 陈玉,戴华,李博涵,杨庚. 面向移动对象的松散型传染模式挖掘方法[J]. 浙江大学学报(工学版), 2022, 56(2): 280-287.
[4] 戴天伦,李博涵,臧亚磊,戴华,于自强,陈钢. PORP:面向无人驾驶的路径规划并行优化策略[J]. 浙江大学学报(工学版), 2022, 56(2): 329-337.
[5] 李鑫悦,陈淑琴,李鸿亮,楼云霄,李佳鹤. 应用数据挖掘的高校教学建筑空调使用及其能耗分析[J]. 浙江大学学报(工学版), 2020, 54(9): 1677-1689.
[6] 龙立,郑山锁,周炎,贺金川,孟宏立,蔡永龙. 基于拟蒙特卡罗方法的供水管网抗震可靠性分析并行化研究[J]. 浙江大学学报(工学版), 2020, 54(2): 241-247.
[7] 周佳立, 陈以军, 武敏. 基于FPGA监听的图像采集与预处理方法[J]. 浙江大学学报(工学版), 2018, 52(2): 398-405.
[8] 王成龙,李诚,冯毅萍,荣冈. 作业车间调度规则的挖掘方法研究[J]. 浙江大学学报(工学版), 2015, 49(3): 421-429.
[9] 胡德超, 池龙哲, 杨琼, 王敏. 水库坝区冲刷漏斗的形成机理[J]. 浙江大学学报(工学版), 2015, 49(2): 257-264.
[10] 柯海丰,应晶. 基于R-ELM的实时车牌字符识别技术[J]. 浙江大学学报(工学版), 2014, 48(7): 1209-1216.
[11] 柯海丰,应晶. 基于R-ELM的实时车牌字符识别技术[J]. J4, 2014, 48(2): 0-0.
[12] 王相兵,童水光,钟崴,张健. 基于可拓重用的液压挖掘机结构性能方案设计[J]. J4, 2013, 47(11): 1992-2002.
[13] 杨鑫, 许端清, 赵磊, 杨冰. 二级光线跟踪的并行计算[J]. J4, 2012, 46(10): 1796-1802.
[14] 刘骥, 朱庆生, 黄晓凤, 曾令秋, 李松阳. 基于GPU的植物生长模拟[J]. J4, 2012, 46(10): 1803-1809.
[15] 杨鑫 , 王天明, 许端清. 基于GPU的层次包围盒快速构造方法[J]. J4, 2012, 46(1): 84-89.