1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China 2. School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
A new keyword search algorithm on graphics processing unit (GPU) was designed based on the research of the traditional keyword search problem on graphs. First of all, a keyword search problem based on Steiner tree semantics was defined. A basic algorithm was designed on the CPU in combination with the traditional all-pair shortest path algorithm. The algorithm could not be directly transplanted to the GPU due to the characteristics of the CPU architecture. Secondly, a basic search algorithm on GPU was designed, and its advantages and remaining shortcomings compared to the CPU version were analyzed. To improve the query speed of the algorithm, an index-based optimization technique was proposed reflecting on the shortcomings of the basic search algorithm on GPU. An efficient keyword search algorithm on GPU was designed, using the relaxing and updating idea of the single-source shortest path algorithm, keyword independence, and internal integrity. Finally, an optimization idea on GPU for the r-cliques keyword search problem was proposed based on the idea of the algorithm. By analyzing the complexity of the algorithm and conducting experiments on real data sets, the correctness and effectiveness of the GPU algorithm are proved, and it is proved that the algorithm still has strong computing performance on large-scale graph data.
He-xiang LIN,Lian-peng QIAO,Ye YUAN,Guo-ren WANG. Keyword search algorithm of large graph based on GPU. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 271-279.
Fig.1Example graph of keyword search and answer tree
节点 $v$
${ {{\rm{Score}}} _v}$
${ {\boldsymbol{K} }_{ {T_v} } }$
${v_1}$
$5$
$[{v_2},{v_3}]$
${v_2}$
$2$
$[{v_2},{v_5}]$
${v_3}$
$2$
$[{v_2},{v_3}]$
${v_4}$
$1$
$[{v_4},{v_5}]$
${v_5}$
$0$
$[{v_5},{v_5}]$
Tab.1Result of example graph of keyword search
Fig.2Example graph for $r$-cliques problem
数据集
节点数
边数
最多标签
总关键字
D1
2 668
209 504
1
191
D2
6 410
75 052
60
112
D3
8 228 888
49 691 058
155
220
D4
13 629 975
171 789 964
1
293
D5
48 963 638
483 795 764
4
198
Tab.2Scale infomation of data sets for keyword search experiment
Fig.3Overview of execution time of three keyword search algorithms
Fig.4Display for result of sample data of AdvG
Fig.5Display for extension of vertices of AdvG
Fig.6Display for extension of edges of AdvG
[1]
DING B, YU J X, WANG S, et al. Finding top-k min-cost connected trees in databases [C]// 2007 IEEE 23rd International Conference on Data Engineering. Istanbul: IEEE, 2007: 836-845.
[2]
KARGAR M, AN A Keyword search in graphs: finding r-cliques[J]. Proceedings of the VLDB Endowment, 2011, 4 (10): 681- 692
doi: 10.14778/2021017.2021025
[3]
BHALOTIA G, HULGERI A, NAKHE C, et al. Keyword searching and browsing in databases using BANKS [C]// Proceedings of 18th International Conference on Data Engineering. San Jose: IEEE, 2002: 431-440.
[4]
HE H, WANG H, YANG J, et al. Blinks: ranked keyword searches on graphs [C]// Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. Beijing: ACM, 2007: 305-316.
[5]
WANG Y, DAVIDSON A, PAN Y, et al. Gunrock: a high-performance graph processing library on the GPU [C]// Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Barcelona: ACM, 2016: 1-12.
[6]
PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the web[R].[s.l.]: Stanford InfoLab, 1999.
[7]
FLOYD R W Algorithm 97: shortest path[J]. Communications of the ACM, 1962, 5 (6): 345
[8]
HARISH P, NARAYANAN P J. Accelerating large graph algorithms on the GPU using CUDA [C]// International Conference on High-Performance Computing. Goa: Springer, 2007: 197-208.
[9]
BELLMAN R On a routing problem[J]. Quarterly of Applied Mathematics, 1958, 16 (1): 87- 90
doi: 10.1090/qam/102435
[10]
FORD JR L R. Network flow theory [R]. [s.l.]: Rand Corp, 1956.
[11]
YANG D, ZHANG D, YU Z, et al. Fine-grained preference-aware location search leveraging crowdsourced digital footprints from LBSNs [C]// Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing. Zurich: ACM, 2013: 479-488.
[12]
YANG D, ZHANG D, YU Z, et al. A sentiment-enhanced personalized location recommendation system [C]// Proceedings of the 24th ACM Conference on Hypertext and Social Media. Paris: ACM, 2013: 119-128.
[13]
LEY M DBLP: some lessons learned[J]. Proceedings of the VLDB Endowment, 2009, 2 (2): 1493- 1500
doi: 10.14778/1687553.1687577
[14]
BAST H, BÄURLE F, BUCHHOLD B, et al. Easy access to the freebase dataset [C]// Proceedings of the 23rd International Conference on World Wide Web. Seoul: ACM, 2014: 95-98.
[15]
AGGARWAL C C, WANG H X. Managing and mining graph data [M]. New York: Springer, 2010.
[16]
AGRAWAL S, CHAUDHURI S, DAS G. DBXplorer: a system for keyword-based search over relational databases [C]// Proceedings of 18th International Conference on Data Engineering. San Jose: IEEE, 2002: 5-16.
[17]
DOSSO D. A keyword search and citation system for RDF graphs [C]// FDIA. Milan: [s.n.], 2019: 23-28.
[18]
GHANBARPOUR A, NADERI H Survey on ranking functions in keyword search over graph-structured data[J]. Journal of Universal Computer Science, 2019, 25 (4): 361- 389
[19]
YANG Y, AGRAWAL D, JAGADISH H V, et al. An efficient parallel keyword search engine on knowledge graphs [C]// 2019 IEEE 35th International Conference on Data Engineering. Macao: IEEE, 2019: 338-349.
[20]
XIAO T, HAN D, HE J, et al Multi-keyword ranked search based on mapping set matching in cloud ciphertext storage system[J]. Connection Science, 2021, 33 (1): 95- 112
doi: 10.1080/09540091.2020.1753175
[21]
HU X, DUAN J, DANG D Natural language question answering over knowledge graph: the marriage of SPARQL query and keyword search[J]. Knowledge and Information Systems, 2021, 63 (4): 819- 844
doi: 10.1007/s10115-020-01534-4
[22]
FUNG J, TANG F, MANN S. Mediated reality using computer graphics hardware for computer vision [C]// Proceedings of the 6th International Symposium on Wearable Computers. Seattle: IEEE, 2002: 83-89.
[23]
FUNG J, MANN S. Computer vision signal processing on graphics processing units [C]// 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal: IEEE, 2004, 5: V-93.
[24]
CHITTY D M. A data parallel approach to genetic programming using programmable graphics hardware [C]// Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. London: ACM, 2007: 1566-1573.
[25]
SOMAN J, KISHORE K, NARAYANAN P J. A fast GPU algorithm for graph connectivity [C]// 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum. Atlanta: IEEE, 2010: 1-8.
[26]
VINEET V, NARAYANAN P J. CUDA cuts: fast graph cuts on the GPU [C]// 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Anchorage: IEEE, 2008: 1-8.
[27]
HAWICK K A, LEIST A, PLAYNE D P Parallel graph component labelling with GPUs and CUDA[J]. Parallel Computing, 2010, 36 (12): 655- 678
doi: 10.1016/j.parco.2010.07.002
[28]
OSAMA M, TRUONG M, YANG C, et al. Graph coloring on the GPU [C]// 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Rio de Janeiro: IEEE, 2019: 231-240.
[29]
AWAD M A, ASHKIANI S, PORUMBESCU S D, et al. Dynamic graphs on the GPU [C]// 2020 IEEE International Parallel and Distributed Processing Symposium. New Orleans: IEEE, 2020: 739-748.
[30]
ZHOU C, ZHANG T High performance graph data imputation on multiple GPUs[J]. Future Internet, 2021, 13 (2): 36
doi: 10.3390/fi13020036
[31]
MIN S W, WU K, HUANG S, et al. PyTorch-direct: enabling GPU centric data access for very large graph neural network training with irregular accesses[EB/OL]. [2021-06-01]. https://arxiv.org/abs/2101.07956.
[32]
KHORASANI F, VORA K, GUPTA R, et al. CuSha: vertex-centric graph processing on GPUs [C]// Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Vancouver BC: ACM, 2014: 239-252.
[33]
ZHU H, HE L, FU S, et al Wolfpath: accelerating iterative traversing-based graph processing algorithms on GPU[J]. International Journal of Parallel Programming, 2019, 47 (4): 644- 667
doi: 10.1007/s10766-017-0533-y