Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (2): 271-279    DOI: 10.3785/j.issn.1008-973X.2022.02.007
    
Keyword search algorithm of large graph based on GPU
He-xiang LIN1(),Lian-peng QIAO2,Ye YUAN1,*(),Guo-ren WANG1
1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
2. School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
Download: HTML     PDF(2682KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A new keyword search algorithm on graphics processing unit (GPU) was designed based on the research of the traditional keyword search problem on graphs. First of all, a keyword search problem based on Steiner tree semantics was defined. A basic algorithm was designed on the CPU in combination with the traditional all-pair shortest path algorithm. The algorithm could not be directly transplanted to the GPU due to the characteristics of the CPU architecture. Secondly, a basic search algorithm on GPU was designed, and its advantages and remaining shortcomings compared to the CPU version were analyzed. To improve the query speed of the algorithm, an index-based optimization technique was proposed reflecting on the shortcomings of the basic search algorithm on GPU. An efficient keyword search algorithm on GPU was designed, using the relaxing and updating idea of the single-source shortest path algorithm, keyword independence, and internal integrity. Finally, an optimization idea on GPU for the r-cliques keyword search problem was proposed based on the idea of the algorithm. By analyzing the complexity of the algorithm and conducting experiments on real data sets, the correctness and effectiveness of the GPU algorithm are proved, and it is proved that the algorithm still has strong computing performance on large-scale graph data.



Key wordskeyword search      attributed graph      index      general-purpose computing on GPU      parallel computing     
Received: 15 July 2021      Published: 03 March 2022
CLC:  TP 399  
Corresponding Authors: Ye YUAN     E-mail: flashcattail@qq.com;yuan-ye@bit.edu.cn
Cite this article:

He-xiang LIN,Lian-peng QIAO,Ye YUAN,Guo-ren WANG. Keyword search algorithm of large graph based on GPU. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 271-279.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.02.007     OR     https://www.zjujournals.com/eng/Y2022/V56/I2/271


基于GPU的大图数据上的关键字检索算法

在传统图上关键字检索问题研究的基础上,基于图形处理器(GPU)设计新的关键字检索算法. 基于Steiner tree语义定义关键字检索问题,针对该问题结合传统多源最短路径算法在CPU上设计基本算法,由于CPU架构特性,该算法无法直接移植到GPU上. 提出GPU上的基本检索算法,分析它相对于CPU版本的优势和仍然存在的不足. 为了提升算法查询速度,反思GPU上基本检索算法的不足之处,提出基于索引的优化技术,利用单源最短路径算法的松弛更新思想、关键字独立性和内部整体性,设计GPU上的高效关键字检索算法. 扩展该算法思想,对r-cliques关键字检索问题提出GPU上的优化思路. 通过分析算法复杂度并在真实数据集上进行实验,证明该GPU算法的正确性和有效性,并证明算法在较大规模图数据上仍有较强的计算性能.


关键词: 关键字检索,  属性图,  索引,  GPU通用计算,  并行计算 
Fig.1 Example graph of keyword search and answer tree
节点 $v$ ${ {{\rm{Score}}} _v}$ ${ {\boldsymbol{K} }_{ {T_v} } }$
${v_1}$ $5$ $[{v_2},{v_3}]$
${v_2}$ $2$ $[{v_2},{v_5}]$
${v_3}$ $2$ $[{v_2},{v_3}]$
${v_4}$ $1$ $[{v_4},{v_5}]$
${v_5}$ $0$ $[{v_5},{v_5}]$
Tab.1 Result of example graph of keyword search
Fig.2 Example graph for $r$-cliques problem
数据集 节点数 边数 最多标签 总关键字
D1 2 668 209 504 1 191
D2 6 410 75 052 60 112
D3 8 228 888 49 691 058 155 220
D4 13 629 975 171 789 964 1 293
D5 48 963 638 483 795 764 4 198
Tab.2 Scale infomation of data sets for keyword search experiment
Fig.3 Overview of execution time of three keyword search algorithms
Fig.4 Display for result of sample data of AdvG
Fig.5 Display for extension of vertices of AdvG
Fig.6 Display for extension of edges of AdvG
[1]   DING B, YU J X, WANG S, et al. Finding top-k min-cost connected trees in databases [C]// 2007 IEEE 23rd International Conference on Data Engineering. Istanbul: IEEE, 2007: 836-845.
[2]   KARGAR M, AN A Keyword search in graphs: finding r-cliques[J]. Proceedings of the VLDB Endowment, 2011, 4 (10): 681- 692
doi: 10.14778/2021017.2021025
[3]   BHALOTIA G, HULGERI A, NAKHE C, et al. Keyword searching and browsing in databases using BANKS [C]// Proceedings of 18th International Conference on Data Engineering. San Jose: IEEE, 2002: 431-440.
[4]   HE H, WANG H, YANG J, et al. Blinks: ranked keyword searches on graphs [C]// Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. Beijing: ACM, 2007: 305-316.
[5]   WANG Y, DAVIDSON A, PAN Y, et al. Gunrock: a high-performance graph processing library on the GPU [C]// Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Barcelona: ACM, 2016: 1-12.
[6]   PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the web[R].[s.l.]: Stanford InfoLab, 1999.
[7]   FLOYD R W Algorithm 97: shortest path[J]. Communications of the ACM, 1962, 5 (6): 345
[8]   HARISH P, NARAYANAN P J. Accelerating large graph algorithms on the GPU using CUDA [C]// International Conference on High-Performance Computing. Goa: Springer, 2007: 197-208.
[9]   BELLMAN R On a routing problem[J]. Quarterly of Applied Mathematics, 1958, 16 (1): 87- 90
doi: 10.1090/qam/102435
[10]   FORD JR L R. Network flow theory [R]. [s.l.]: Rand Corp, 1956.
[11]   YANG D, ZHANG D, YU Z, et al. Fine-grained preference-aware location search leveraging crowdsourced digital footprints from LBSNs [C]// Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing. Zurich: ACM, 2013: 479-488.
[12]   YANG D, ZHANG D, YU Z, et al. A sentiment-enhanced personalized location recommendation system [C]// Proceedings of the 24th ACM Conference on Hypertext and Social Media. Paris: ACM, 2013: 119-128.
[13]   LEY M DBLP: some lessons learned[J]. Proceedings of the VLDB Endowment, 2009, 2 (2): 1493- 1500
doi: 10.14778/1687553.1687577
[14]   BAST H, BÄURLE F, BUCHHOLD B, et al. Easy access to the freebase dataset [C]// Proceedings of the 23rd International Conference on World Wide Web. Seoul: ACM, 2014: 95-98.
[15]   AGGARWAL C C, WANG H X. Managing and mining graph data [M]. New York: Springer, 2010.
[16]   AGRAWAL S, CHAUDHURI S, DAS G. DBXplorer: a system for keyword-based search over relational databases [C]// Proceedings of 18th International Conference on Data Engineering. San Jose: IEEE, 2002: 5-16.
[17]   DOSSO D. A keyword search and citation system for RDF graphs [C]// FDIA. Milan: [s.n.], 2019: 23-28.
[18]   GHANBARPOUR A, NADERI H Survey on ranking functions in keyword search over graph-structured data[J]. Journal of Universal Computer Science, 2019, 25 (4): 361- 389
[19]   YANG Y, AGRAWAL D, JAGADISH H V, et al. An efficient parallel keyword search engine on knowledge graphs [C]// 2019 IEEE 35th International Conference on Data Engineering. Macao: IEEE, 2019: 338-349.
[20]   XIAO T, HAN D, HE J, et al Multi-keyword ranked search based on mapping set matching in cloud ciphertext storage system[J]. Connection Science, 2021, 33 (1): 95- 112
doi: 10.1080/09540091.2020.1753175
[21]   HU X, DUAN J, DANG D Natural language question answering over knowledge graph: the marriage of SPARQL query and keyword search[J]. Knowledge and Information Systems, 2021, 63 (4): 819- 844
doi: 10.1007/s10115-020-01534-4
[22]   FUNG J, TANG F, MANN S. Mediated reality using computer graphics hardware for computer vision [C]// Proceedings of the 6th International Symposium on Wearable Computers. Seattle: IEEE, 2002: 83-89.
[23]   FUNG J, MANN S. Computer vision signal processing on graphics processing units [C]// 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal: IEEE, 2004, 5: V-93.
[24]   CHITTY D M. A data parallel approach to genetic programming using programmable graphics hardware [C]// Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. London: ACM, 2007: 1566-1573.
[25]   SOMAN J, KISHORE K, NARAYANAN P J. A fast GPU algorithm for graph connectivity [C]// 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum. Atlanta: IEEE, 2010: 1-8.
[26]   VINEET V, NARAYANAN P J. CUDA cuts: fast graph cuts on the GPU [C]// 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Anchorage: IEEE, 2008: 1-8.
[27]   HAWICK K A, LEIST A, PLAYNE D P Parallel graph component labelling with GPUs and CUDA[J]. Parallel Computing, 2010, 36 (12): 655- 678
doi: 10.1016/j.parco.2010.07.002
[28]   OSAMA M, TRUONG M, YANG C, et al. Graph coloring on the GPU [C]// 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Rio de Janeiro: IEEE, 2019: 231-240.
[29]   AWAD M A, ASHKIANI S, PORUMBESCU S D, et al. Dynamic graphs on the GPU [C]// 2020 IEEE International Parallel and Distributed Processing Symposium. New Orleans: IEEE, 2020: 739-748.
[30]   ZHOU C, ZHANG T High performance graph data imputation on multiple GPUs[J]. Future Internet, 2021, 13 (2): 36
doi: 10.3390/fi13020036
[31]   MIN S W, WU K, HUANG S, et al. PyTorch-direct: enabling GPU centric data access for very large graph neural network training with irregular accesses[EB/OL]. [2021-06-01]. https://arxiv.org/abs/2101.07956.
[32]   KHORASANI F, VORA K, GUPTA R, et al. CuSha: vertex-centric graph processing on GPUs [C]// Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Vancouver BC: ACM, 2014: 239-252.
[33]   ZHU H, HE L, FU S, et al Wolfpath: accelerating iterative traversing-based graph processing algorithms on GPU[J]. International Journal of Parallel Programming, 2019, 47 (4): 644- 667
doi: 10.1007/s10766-017-0533-y
[1] Xi-yun YANG,Ya-xin LIU,Wen-bing MA,Guo-tong XING,Feng GAO. Bidding strategy of wind farm participation in frequency regulation market considering wind power uncertainty[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 736-744.
[2] Si-han DONG,Jun-chang XIN,Kun HAO,Zhong-ming YAO,Jin-yi CHEN. A join query optimization algorithm in multi-blockchain environment[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 313-321.
[3] Tian-lun DAI,Bo-han LI,Ya-lei ZANG,Hua DAI,Zi-qiang YU,Gang CHEN. PORP: parallel optimization strategy of route planning for self-driving vehicles[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 329-337.
[4] Chu-hang YANG,Shao-hui JIA,Ye-cheng MA,Jing-wei ZHU,Yong LIU,Gao-rong HAN. Three-dimensional numerical engineering simulation of oxy-fuel high alumina glass furnace[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 410-418.
[5] Juan-juan REN,Kuan LIU,Wei-hua WANG,Ying ZHANG,Ke-xin YANG,Ming-ming LIU. Evaluation of cracking condition for CRTS Ⅲ prefabricated slab track based on interval analytic hierarchy process[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2267-2274.
[6] Qi-zhen KANG,Jing-jing LI,Yu-chao LI,Shi-yuan YAO,Yun-min CHEN. Permeability of sodium polyacrylate modified calcium bentonite in acid-base salt solution[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(10): 1877-1884.
[7] Jun-ming WANG,Xing-ya ZHAO,Ling-hong CHEN,Li-xia HAN,Xiang GAO,Ke-fa CEN. Ammonia effect on optical properties of secondary organic aerosols[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1812-1818.
[8] Fei WANG,Guo-fang GONG,Li-wen DUAN,Yong-feng QIN. XGBoost based intelligent determination system design of tunnel boring machine operation parameters[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(4): 633-641.
[9] Bing YANG,Wen-bo MO,Jin-liang YAO. 3D palmprint recognition by using local features and deep learning[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 540-545.
[10] Li LONG,Shan-suo ZHENG,Yan ZHOU,Jin-chuan HE,Hong-li MENG,Yong-long CAI. Parallel study of seismic reliability analysis of water supply pipe network based on quasi-Monte Carlo method[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(2): 241-247.
[11] Ai-min ZHOU,Cai-xia ZHOU,Jin-yan OUYANG,Shu-tao ZHANG. Model of synthetic evaluation on interface stylistic beauty based on moderately standardized of index[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(12): 2273-2285.
[12] Liang CAI,Hong-cen ZHOU,Heng BAI,Zhen-gong CAI,Ke-ting YIN,Yi-jun BEI. Application load forecasting method based on multi-layer bidirectional LSTM and improved PSO algorithm[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(12): 2414-2422.
[13] LI Ying, YAN Guo-feng, WANG Yu, HUANG Yu-xin, DU Jian-xiang, XIE Li-juan, HE Sai-ling. Classification of tobacco aroma based on Terahertz time domain spectroscopy[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(3): 537-542.
[14] ZHOU Jia-li, CHEN Yi-jun, WU Min. Image acquisition and preprocessing method based on FPGA monitor[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(2): 398-405.
[15] LIU Li, Krewinkel B C, Booij M J, XU Yue-Ping. Correlation between climate indexes and precipitation of Lanjiang River[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(12): 2332-2341.