|
|
Research on subgraph matching optimization based on GPU |
An-teng LI1( ),Peng-jie CUI2,Ye YUAN1,*( ),Guo-ren WANG1 |
1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China 2. School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China |
|
|
Abstract An efficient graphic processing unit (GPU)-based subgraph matching algorithm GpSI was proposed, and the optimization schemes were designed for the filtering phase and the joining phase of the mainstream algorithms. In the filtering phase, a filtering algorithm based on the composite signatures was proposed, and the local quantitative and structural features of the vertices were used to improve the filtering ability of the candidate sets. In the joining phase, a joining strategy based on candidate vertices was adopted. The space was pre-allocated at the granularity of the minimum number of neighbors, an efficient set operations was designed to realize the joining, and the extra overhead caused by the repeated joins in the traditional method was avoided. Experimental results of multiple datasets show that GpSI has obvious advantages in the filtering ability of the candidate set, the execution time, the GPU memory usage and the stability compared with the mainstream GPU subgraph matching algorithms. In a real data set experiment, the execution time of GpSI was 2 to 10 times faster compared to the execution time of GPU-friendly subgraph isomorphism algorithm.
|
Received: 31 October 2022
Published: 16 October 2023
|
|
Fund: 国家自然科学基金资助项目(61932004, 61732003); 中央高校基本科研基金资助项目(N181605012) |
Corresponding Authors:
Ye YUAN
E-mail: lee_anteng@qq.com;yuan-ye@bit.edu.cn
|
基于GPU的子图匹配优化技术
提出高效的基于图形处理器(GPU)的子图匹配算法GpSI,针对主流算法的过滤阶段和连接阶段分别设计优化方案. 提出基于复合签名的过滤算法,在过滤阶段利用结点所处局部的数量特征和结构特征提升候选集过滤能力. 采用基于候选点的连接策略,在连接阶段以最小邻居数为粒度预分配空间,设计高效的集合运算,避免传统方法重复连接的额外开销. 多个数据集测试结果表明GpSI较主流GPU子图匹配算法在候选集过滤能力、执行用时、GPU内存占用和稳定性上均有明显优势. 在真实数据集测试中,相比GPU友好子图匹配算法,GpSI的执行用时加速2~10倍.
关键词:
子图同构,
数据挖掘,
图形处理器(GPU),
并行计算,
高性能计算
|
|
[1] |
WANG J, CHENG J. Truss decomposition in massive networks [C]// proceedings of the VLDB Endowment. [S.l.]: VLDB Endowment, 2012, 5(9): 812–823.
|
|
|
[2] |
KAIRAM S R, WANG D J, LESKOVEC J. The life and death of online groups: predicting group growth and longevity [C]// Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. [S.l.]: ACM, 2012: 673–682.
|
|
|
[3] |
PÉREZ J, ARENAS M, GUTIERREZ C. Semantics and complexity of sparql [C]// ACM Transactions on Database Systems, 2009, 16: 1-45.
|
|
|
[4] |
CHRISTOPHIDES V. Resource description framework (RDF) schema (RDFS) [M]// LIU L, ÖZSU M T. Encyclopedia of database systems. Boston: Springer, 2009: 2425-2428.
|
|
|
[5] |
ZENG L, ZOU L Redesign of the gstore system[J]. Frontiers Computer Science, 2018, 12: 623- 641
doi: 10.1007/s11704-018-7212-z
|
|
|
[6] |
YAN X F, YU P S, HAN J W. Graph indexing: a frequent structure-based approach [C]// Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. [S.l.]: ACM, 2004: 335-346.
|
|
|
[7] |
LIU H B, BLOUIN C, KEŠELJ V. Biological event extraction using subgraph matching [EB/OL]. [2022-06-01]. https://ceur-ws.org/Vol-714/ShortPaper03_Liu.pdf.
|
|
|
[8] |
GAREY M R, JOHNSON D S. Computers and intractability: a guide to the theory of NP-completeness [M]. [S.l.]: W. H. Freeman and Company, 1979.
|
|
|
[9] |
CONTE D, FOGGIA P, SANSONE C, et al Thirty years of graph matching in pattern recognition[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2004, 18 (3): 265- 298
doi: 10.1142/S0218001404003228
|
|
|
[10] |
LIN H, ZHU X W, YU B W, et al. ShenTu: processing multi-trillion edge graphs on millions of cores in seconds [C]// SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas: IEEE, 2018: 706-716.
|
|
|
[11] |
MENG K, LI J J, TAN G M, et al. A pattern based algorithmic autotuner for graph processing on GPUs [C]// Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. [S.l.]: ACM, 2019: 201-213.
|
|
|
[12] |
KHORASANI F, VORA K, GUPTA R, et al. CuSha: vertex-centric graph processing on GPUs [C]// Proceeding of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. [S.l.]: ACM, 2014: 239-252.
|
|
|
[13] |
ULLMANN J R. An algorithm for subgraph isomorphism [J] Journal of the ACM, 1976, 23(1): 31-42.
|
|
|
[14] |
CORDELLA L P, FOGGIA P, SANSONE C, et al A (sub)graph isomorphism algorithm for matching large graphs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26 (10): 1367- 1372
doi: 10.1109/TPAMI.2004.75
|
|
|
[15] |
CARLETTI V, FOGGIA P, SAGGESE A, et al Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 804- 818
doi: 10.1109/TPAMI.2017.2696940
|
|
|
[16] |
HAN W S, LEE J, LEE J H. TurboISO: towards ultrafast and robust subgraph isomorphism search in large graph databases [C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. [S.l.]: ACM, 2013: 337-348.
|
|
|
[17] |
HE H, SINGH A K. Graphs-at-a-time: query language and access methods for graph databases [C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. [S.l.]: ACM, 2008: 405-418.
|
|
|
[18] |
SUN Z, WANG H Z, WANG H X, et al. Efficient subgraph matching on billion node graphs [C]// Proceedings of the VLDB Endowment. [S.l.]: VLDB Endowment, 2012, 5(9): 788-799.
|
|
|
[19] |
ZHANG S J, LI S R, YANG J. GADDI: distance index based subgraph matching in biological networks [C]// Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. [S.l.]: ACM, 2009: 192-203.
|
|
|
[20] |
LEE J, HAN W S, KASPEROVICS R, et al. An in-depth comparison of subgraph isomorphism algorithms in graph databases [C]// Proceedings of the VLDB Endowment. [S.l.]: VLDB Endowment, 2012, 6(2): 133-144.
|
|
|
[21] |
徐周波, 李珍, 刘华东, 等 基于邻居信息聚合的子图同构匹配算法[J]. 计算机应用, 2021, 41 (1): 43- 47 XU Zhou-bo, LI Zhen, LIU Hua-dong, et al Subgraph isomorphism matching algorithm based on neighbor information aggregation[J]. Journal of Computer Applications, 2021, 41 (1): 43- 47
|
|
|
[22] |
MOORMAN J D, CHEN Q Y, TU T K, et al. Filtering methods for subgraph matching on multiplex networks [C]// Proceedings of the 2018 IEEE International Conference on Big Data. Seattle: IEEE, 2018: 3980-3985.
|
|
|
[23] |
TRAN H N, KIM J J, HE B S. Fast subgraph matching on large graphs using graphics processors [C]// Proceedings of the 2015 International Conference on Database Systems for Advanced Applications. [S.l.]: Springer, 2015: 299-315.
|
|
|
[24] |
WANG L, OWENS J D. Fast Gunrock subgraph matching (GSM) on GPUs [EB/OL]. [2022-06-01]. https://arxiv.org/pdf/2003.01527.pdf.
|
|
|
[25] |
WANG Y Z H, DAVIDSON A, PAN Y C, et al. Gunrock: a high-performance graph processing library on the GPU [C]// Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. [S.l.]: ACM, 2015: 265-266
|
|
|
[26] |
ZENG L, ZOU L, ÖZSU M T, et al. GSI: GPU-friendly subgraph isomorphism [C]// Proceedings of the 2020 IEEE 36th International Conference on Data Engineering. Dallas: IEEE, 2020: 1249-1260.
|
|
|
[27] |
BONNICI V, GIUGNO R, BOMBIERI N. An efficient implementation of a subgraph isomorphism algorithm for GPUs [C]// Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine. Madrid: IEEE, 2018: 2674-2681.
|
|
|
[28] |
汪洋, 江世杰, 曹宇聪, 等 多编码树GPU并行轴心子图匹配[J]. 计算机应用, 2022, 42 (1): 132- 139 WAGN Yang, JIANG Shi-jie, CAO Yu-cong, et al Parallel pivoted subgraph matching with multiple coding trees on GPU[J]. Journal of Computer Applications, 2022, 42 (1): 132- 139
|
|
|
[29] |
KIRK D. NVIDIA CUDA software and GPU parallel computing architecture [C]// Proceedings of the 6th International Symposium on Memory Management. [S.l.]: ACM, 2007: 103–104.
|
|
|
[30] |
NVIDIA: CUDA Toolkit documentation v10.2 [CP/OL]. (2019-11-28)[2022-06-01]. https://docs.nvidia.com/cuda/archive/10.2/.
|
|
|
[31] |
ASHARI A, SEDAGHATI N, EISENLOHR J, et al. Fast sparse matrix-vector multiplication on GPUs for graph applications [C]// Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. New Orleans: IEEE, 2014: 781-792.
|
|
|
[32] |
李岑浩, 崔鹏杰, 袁野, 等 面向大图子图匹配的多GPU编程模型[J]. 计算机科学与探索, 2023, 17 (7): 1576- 1585 LI Cen-hao, CUI Peng-jie, YUAN Ye, et al Muti-GPU programming model for subgraph matching in large graphs[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17 (7): 1576- 1585
|
|
|
[33] |
LESKOVEC J. Stanford large network dataset collection [DS/OL]. (2021-09-17)[2022-06-01]. http://snap.stanford.edu/data/.
|
|
|
[34] |
REZA T, RIPEANU M, TRIPOUL N, et al. PruneJuice: pruning trillion-edge graphs to a precise pattern-matching solution [C]// Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. Dallas: IEEE, 2018: 1-17.
|
|
|
[35] |
Boost Library: VF2 (Sub)Graph Isomorphism-master [CP/OL]. (2013-04-26) [2022-06-01]. https://www.boost.org/doc/libs/master/libs/graph.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|