Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (9): 1856-1864    DOI: 10.3785/j.issn.1008-973X.2023.09.017
    
Research on subgraph matching optimization based on GPU
An-teng LI1(),Peng-jie CUI2,Ye YUAN1,*(),Guo-ren WANG1
1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
2. School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
Download: HTML     PDF(1352KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An efficient graphic processing unit (GPU)-based subgraph matching algorithm GpSI was proposed, and the optimization schemes were designed for the filtering phase and the joining phase of the mainstream algorithms. In the filtering phase, a filtering algorithm based on the composite signatures was proposed, and the local quantitative and structural features of the vertices were used to improve the filtering ability of the candidate sets. In the joining phase, a joining strategy based on candidate vertices was adopted. The space was pre-allocated at the granularity of the minimum number of neighbors, an efficient set operations was designed to realize the joining, and the extra overhead caused by the repeated joins in the traditional method was avoided. Experimental results of multiple datasets show that GpSI has obvious advantages in the filtering ability of the candidate set, the execution time, the GPU memory usage and the stability compared with the mainstream GPU subgraph matching algorithms. In a real data set experiment, the execution time of GpSI was 2 to 10 times faster compared to the execution time of GPU-friendly subgraph isomorphism algorithm.



Key wordssubgraph isomorphism      data mining      graphic processing unit (GPU)      parallel computing      high performance computing     
Received: 31 October 2022      Published: 16 October 2023
CLC:  TP 399  
Fund:  国家自然科学基金资助项目(61932004, 61732003); 中央高校基本科研基金资助项目(N181605012)
Corresponding Authors: Ye YUAN     E-mail: lee_anteng@qq.com;yuan-ye@bit.edu.cn
Cite this article:

An-teng LI,Peng-jie CUI,Ye YUAN,Guo-ren WANG. Research on subgraph matching optimization based on GPU. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1856-1864.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.09.017     OR     https://www.zjujournals.com/eng/Y2023/V57/I9/1856


基于GPU的子图匹配优化技术

提出高效的基于图形处理器(GPU)的子图匹配算法GpSI,针对主流算法的过滤阶段和连接阶段分别设计优化方案. 提出基于复合签名的过滤算法,在过滤阶段利用结点所处局部的数量特征和结构特征提升候选集过滤能力. 采用基于候选点的连接策略,在连接阶段以最小邻居数为粒度预分配空间,设计高效的集合运算,避免传统方法重复连接的额外开销. 多个数据集测试结果表明GpSI较主流GPU子图匹配算法在候选集过滤能力、执行用时、GPU内存占用和稳定性上均有明显优势. 在真实数据集测试中,相比GPU友好子图匹配算法,GpSI的执行用时加速2~10倍.


关键词: 子图同构,  数据挖掘,  图形处理器(GPU),  并行计算,  高性能计算 
Fig.1 Example of subgraph matching
Fig.2 Framework of GpSI
Fig.3 Quantity signature
Fig.4 Local paths of vertex
Fig.5 Structure signature
Fig.6 Composite signature
Fig.7 Generating new matching table
Fig.8 Intersection operation
数据集 NV NE MD 是否幂律图
facebook 4 039 88 234 1 045
enron 36 692 183 831 1 383
gowalla 196 591 950 327 14 730
dblp 317 080 1 049 866 343
road 1 379 917 1 921 660 12
Tab.1 Data set information of subgraph matching
Fig.9 Query graph
数据集 $\min {\text{|}}C(u){\text{|}}$ t/ms
GpSI GSI GpSI GSI
facebook 647 657 3 4
enron 2211 2429 4 5
gowalla 9299 14197 6 7
dblp 19475 23026 7 11
road 3869 47973 17 22
Tab.2 Filtering ability test results of two algorithms on different datasets
Fig.10 Comparison of execution time for algorithms
Fig.11 Comparison of graphic processing unit memory consumption for algorithms
Fig.12 Scalability test of three algorithms
[1]   WANG J, CHENG J. Truss decomposition in massive networks [C]// proceedings of the VLDB Endowment. [S.l.]: VLDB Endowment, 2012, 5(9): 812–823.
[2]   KAIRAM S R, WANG D J, LESKOVEC J. The life and death of online groups: predicting group growth and longevity [C]// Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. [S.l.]: ACM, 2012: 673–682.
[3]   PÉREZ J, ARENAS M, GUTIERREZ C. Semantics and complexity of sparql [C]// ACM Transactions on Database Systems, 2009, 16: 1-45.
[4]   CHRISTOPHIDES V. Resource description framework (RDF) schema (RDFS) [M]// LIU L, ÖZSU M T. Encyclopedia of database systems. Boston: Springer, 2009: 2425-2428.
[5]   ZENG L, ZOU L Redesign of the gstore system[J]. Frontiers Computer Science, 2018, 12: 623- 641
doi: 10.1007/s11704-018-7212-z
[6]   YAN X F, YU P S, HAN J W. Graph indexing: a frequent structure-based approach [C]// Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. [S.l.]: ACM, 2004: 335-346.
[7]   LIU H B, BLOUIN C, KEŠELJ V. Biological event extraction using subgraph matching [EB/OL]. [2022-06-01]. https://ceur-ws.org/Vol-714/ShortPaper03_Liu.pdf.
[8]   GAREY M R, JOHNSON D S. Computers and intractability: a guide to the theory of NP-completeness [M]. [S.l.]: W. H. Freeman and Company, 1979.
[9]   CONTE D, FOGGIA P, SANSONE C, et al Thirty years of graph matching in pattern recognition[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2004, 18 (3): 265- 298
doi: 10.1142/S0218001404003228
[10]   LIN H, ZHU X W, YU B W, et al. ShenTu: processing multi-trillion edge graphs on millions of cores in seconds [C]// SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas: IEEE, 2018: 706-716.
[11]   MENG K, LI J J, TAN G M, et al. A pattern based algorithmic autotuner for graph processing on GPUs [C]// Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. [S.l.]: ACM, 2019: 201-213.
[12]   KHORASANI F, VORA K, GUPTA R, et al. CuSha: vertex-centric graph processing on GPUs [C]// Proceeding of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. [S.l.]: ACM, 2014: 239-252.
[13]   ULLMANN J R. An algorithm for subgraph isomorphism [J] Journal of the ACM, 1976, 23(1): 31-42.
[14]   CORDELLA L P, FOGGIA P, SANSONE C, et al A (sub)graph isomorphism algorithm for matching large graphs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26 (10): 1367- 1372
doi: 10.1109/TPAMI.2004.75
[15]   CARLETTI V, FOGGIA P, SAGGESE A, et al Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 804- 818
doi: 10.1109/TPAMI.2017.2696940
[16]   HAN W S, LEE J, LEE J H. TurboISO: towards ultrafast and robust subgraph isomorphism search in large graph databases [C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. [S.l.]: ACM, 2013: 337-348.
[17]   HE H, SINGH A K. Graphs-at-a-time: query language and access methods for graph databases [C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. [S.l.]: ACM, 2008: 405-418.
[18]   SUN Z, WANG H Z, WANG H X, et al. Efficient subgraph matching on billion node graphs [C]// Proceedings of the VLDB Endowment. [S.l.]: VLDB Endowment, 2012, 5(9): 788-799.
[19]   ZHANG S J, LI S R, YANG J. GADDI: distance index based subgraph matching in biological networks [C]// Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. [S.l.]: ACM, 2009: 192-203.
[20]   LEE J, HAN W S, KASPEROVICS R, et al. An in-depth comparison of subgraph isomorphism algorithms in graph databases [C]// Proceedings of the VLDB Endowment. [S.l.]: VLDB Endowment, 2012, 6(2): 133-144.
[21]   徐周波, 李珍, 刘华东, 等 基于邻居信息聚合的子图同构匹配算法[J]. 计算机应用, 2021, 41 (1): 43- 47
XU Zhou-bo, LI Zhen, LIU Hua-dong, et al Subgraph isomorphism matching algorithm based on neighbor information aggregation[J]. Journal of Computer Applications, 2021, 41 (1): 43- 47
[22]   MOORMAN J D, CHEN Q Y, TU T K, et al. Filtering methods for subgraph matching on multiplex networks [C]// Proceedings of the 2018 IEEE International Conference on Big Data. Seattle: IEEE, 2018: 3980-3985.
[23]   TRAN H N, KIM J J, HE B S. Fast subgraph matching on large graphs using graphics processors [C]// Proceedings of the 2015 International Conference on Database Systems for Advanced Applications. [S.l.]: Springer, 2015: 299-315.
[24]   WANG L, OWENS J D. Fast Gunrock subgraph matching (GSM) on GPUs [EB/OL]. [2022-06-01]. https://arxiv.org/pdf/2003.01527.pdf.
[25]   WANG Y Z H, DAVIDSON A, PAN Y C, et al. Gunrock: a high-performance graph processing library on the GPU [C]// Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. [S.l.]: ACM, 2015: 265-266
[26]   ZENG L, ZOU L, ÖZSU M T, et al. GSI: GPU-friendly subgraph isomorphism [C]// Proceedings of the 2020 IEEE 36th International Conference on Data Engineering. Dallas: IEEE, 2020: 1249-1260.
[27]   BONNICI V, GIUGNO R, BOMBIERI N. An efficient implementation of a subgraph isomorphism algorithm for GPUs [C]// Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine. Madrid: IEEE, 2018: 2674-2681.
[28]   汪洋, 江世杰, 曹宇聪, 等 多编码树GPU并行轴心子图匹配[J]. 计算机应用, 2022, 42 (1): 132- 139
WAGN Yang, JIANG Shi-jie, CAO Yu-cong, et al Parallel pivoted subgraph matching with multiple coding trees on GPU[J]. Journal of Computer Applications, 2022, 42 (1): 132- 139
[29]   KIRK D. NVIDIA CUDA software and GPU parallel computing architecture [C]// Proceedings of the 6th International Symposium on Memory Management. [S.l.]: ACM, 2007: 103–104.
[30]   NVIDIA: CUDA Toolkit documentation v10.2 [CP/OL]. (2019-11-28)[2022-06-01]. https://docs.nvidia.com/cuda/archive/10.2/.
[31]   ASHARI A, SEDAGHATI N, EISENLOHR J, et al. Fast sparse matrix-vector multiplication on GPUs for graph applications [C]// Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. New Orleans: IEEE, 2014: 781-792.
[32]   李岑浩, 崔鹏杰, 袁野, 等 面向大图子图匹配的多GPU编程模型[J]. 计算机科学与探索, 2023, 17 (7): 1576- 1585
LI Cen-hao, CUI Peng-jie, YUAN Ye, et al Muti-GPU programming model for subgraph matching in large graphs[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17 (7): 1576- 1585
[33]   LESKOVEC J. Stanford large network dataset collection [DS/OL]. (2021-09-17)[2022-06-01]. http://snap.stanford.edu/data/.
[34]   REZA T, RIPEANU M, TRIPOUL N, et al. PruneJuice: pruning trillion-edge graphs to a precise pattern-matching solution [C]// Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. Dallas: IEEE, 2018: 1-17.
[35]   Boost Library: VF2 (Sub)Graph Isomorphism-master [CP/OL]. (2013-04-26) [2022-06-01]. https://www.boost.org/doc/libs/master/libs/graph.
[1] Gen LI,Wei ZHAI,Lan WU. Study of merging interactions based on gradient boosting decision tree[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 649-655.
[2] Tian-lun DAI,Bo-han LI,Ya-lei ZANG,Hua DAI,Zi-qiang YU,Gang CHEN. PORP: parallel optimization strategy of route planning for self-driving vehicles[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 329-337.
[3] He-xiang LIN,Lian-peng QIAO,Ye YUAN,Guo-ren WANG. Keyword search algorithm of large graph based on GPU[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 271-279.
[4] Yu CHEN,Hua DAI,Bo-han LI,Geng YANG. Loose infection pattern mining algorithms over moving objects[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 280-287.
[5] Xin-yue LI,Shu-qin CHEN,Hong-liang LI,Yun-xiao LOU,Jia-he LI. Analysis of air-conditioning usage and energy consumption in campus teaching buildings with data mining[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1677-1689.
[6] Li LONG,Shan-suo ZHENG,Yan ZHOU,Jin-chuan HE,Hong-li MENG,Yong-long CAI. Parallel study of seismic reliability analysis of water supply pipe network based on quasi-Monte Carlo method[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(2): 241-247.
[7] ZHOU Jia-li, CHEN Yi-jun, WU Min. Image acquisition and preprocessing method based on FPGA monitor[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(2): 398-405.
[8] WANG Cheng-long, LI Cheng, FENG Yi-ping, RONG Gang. Dispatching rule extraction method for job shop scheduling problem[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(3): 421-429.
[9] HU De-chao, CHI Long-zhe, YANG Qiong, WANG Min. Development mechanics of scouring funnel before dam of reservoir[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(2): 257-264.
[10] KE Hai-feng, YING Jing. Real-time license character recognition technology based on R-ELM[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(7): 1209-1216.
[11] KE Hai-feng, YING Jing. Real-time license character recognition technology based on R-ELM[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(2): 0-0.
[12] WANG Xiang-bing,TONG Shui-guang,ZHONG Wei,ZHANG Jian. Study on scheme design technique for hydraulic excavator's structure performance based on extension reuse[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(11): 1992-2002.
[13] MA Jin, Li-Feng, LI Jian-Hua. Perturbation method for distributed privacy-preserving data mining[J]. Journal of ZheJiang University (Engineering Science), 2010, 44(2): 276-282.
[14] DAN Fu-Dong, CA Ming, TUN Hong-Sen, DONG Jin-Xiang, FU Gao. An incremental algorithm for mining frequent closed patterns[J]. Journal of ZheJiang University (Engineering Science), 2009, 43(8): 1389-1395.
[15] WANG Guang-Ceng, CAO Yi-Jia, CU Jun, et al. Design and implementation of general report integrated management system for power company[J]. Journal of ZheJiang University (Engineering Science), 2009, 43(11): 2062-2066.