Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (2): 271-279    DOI: 10.3785/j.issn.1008-973X.2022.02.007
计算机与控制工程     
基于GPU的大图数据上的关键字检索算法
林鹤翔1(),乔连鹏2,袁野1,*(),王国仁1
1. 北京理工大学 计算机学院,北京 100081
2. 东北大学 计算机科学与工程学院,辽宁 沈阳 110169
Keyword search algorithm of large graph based on GPU
He-xiang LIN1(),Lian-peng QIAO2,Ye YUAN1,*(),Guo-ren WANG1
1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
2. School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
 全文: PDF(2682 KB)   HTML
摘要:

在传统图上关键字检索问题研究的基础上,基于图形处理器(GPU)设计新的关键字检索算法. 基于Steiner tree语义定义关键字检索问题,针对该问题结合传统多源最短路径算法在CPU上设计基本算法,由于CPU架构特性,该算法无法直接移植到GPU上. 提出GPU上的基本检索算法,分析它相对于CPU版本的优势和仍然存在的不足. 为了提升算法查询速度,反思GPU上基本检索算法的不足之处,提出基于索引的优化技术,利用单源最短路径算法的松弛更新思想、关键字独立性和内部整体性,设计GPU上的高效关键字检索算法. 扩展该算法思想,对r-cliques关键字检索问题提出GPU上的优化思路. 通过分析算法复杂度并在真实数据集上进行实验,证明该GPU算法的正确性和有效性,并证明算法在较大规模图数据上仍有较强的计算性能.

关键词: 关键字检索属性图索引GPU通用计算并行计算    
Abstract:

A new keyword search algorithm on graphics processing unit (GPU) was designed based on the research of the traditional keyword search problem on graphs. First of all, a keyword search problem based on Steiner tree semantics was defined. A basic algorithm was designed on the CPU in combination with the traditional all-pair shortest path algorithm. The algorithm could not be directly transplanted to the GPU due to the characteristics of the CPU architecture. Secondly, a basic search algorithm on GPU was designed, and its advantages and remaining shortcomings compared to the CPU version were analyzed. To improve the query speed of the algorithm, an index-based optimization technique was proposed reflecting on the shortcomings of the basic search algorithm on GPU. An efficient keyword search algorithm on GPU was designed, using the relaxing and updating idea of the single-source shortest path algorithm, keyword independence, and internal integrity. Finally, an optimization idea on GPU for the r-cliques keyword search problem was proposed based on the idea of the algorithm. By analyzing the complexity of the algorithm and conducting experiments on real data sets, the correctness and effectiveness of the GPU algorithm are proved, and it is proved that the algorithm still has strong computing performance on large-scale graph data.

Key words: keyword search    attributed graph    index    general-purpose computing on GPU    parallel computing
收稿日期: 2021-07-15 出版日期: 2022-03-03
CLC:  TP 399  
基金资助: 国家自然科学基金资助项目(61932004, 61732003, 61729201);中央高校基本科研基金资助项目(N181605012)
通讯作者: 袁野     E-mail: flashcattail@qq.com;yuan-ye@bit.edu.cn
作者简介: 林鹤翔(1997—),男,硕士生,从事GPU大图计算研究. orcid.org/0000-0002-9183-8508. E-mail: flashcattail@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
林鹤翔
乔连鹏
袁野
王国仁

引用本文:

林鹤翔,乔连鹏,袁野,王国仁. 基于GPU的大图数据上的关键字检索算法[J]. 浙江大学学报(工学版), 2022, 56(2): 271-279.

He-xiang LIN,Lian-peng QIAO,Ye YUAN,Guo-ren WANG. Keyword search algorithm of large graph based on GPU. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 271-279.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.02.007        https://www.zjujournals.com/eng/CN/Y2022/V56/I2/271

图 1  关键字检索与答案树例图
节点 $v$ ${ {{\rm{Score}}} _v}$ ${ {\boldsymbol{K} }_{ {T_v} } }$
${v_1}$ $5$ $[{v_2},{v_3}]$
${v_2}$ $2$ $[{v_2},{v_5}]$
${v_3}$ $2$ $[{v_2},{v_3}]$
${v_4}$ $1$ $[{v_4},{v_5}]$
${v_5}$ $0$ $[{v_5},{v_5}]$
表 1  关键字检索例图求解结果
图 2   $r$-cliques问题图例
数据集 节点数 边数 最多标签 总关键字
D1 2 668 209 504 1 191
D2 6 410 75 052 60 112
D3 8 228 888 49 691 058 155 220
D4 13 629 975 171 789 964 1 293
D5 48 963 638 483 795 764 4 198
表 2  关键字检索实验数据集规模信息
图 3  3种关键字检索算法执行时间总览
图 4  AdvG算法样例数据结果展示
图 5  AdvG算法点扩展性展示
图 6  AdvG算法边扩展性展示
1 DING B, YU J X, WANG S, et al. Finding top-k min-cost connected trees in databases [C]// 2007 IEEE 23rd International Conference on Data Engineering. Istanbul: IEEE, 2007: 836-845.
2 KARGAR M, AN A Keyword search in graphs: finding r-cliques[J]. Proceedings of the VLDB Endowment, 2011, 4 (10): 681- 692
doi: 10.14778/2021017.2021025
3 BHALOTIA G, HULGERI A, NAKHE C, et al. Keyword searching and browsing in databases using BANKS [C]// Proceedings of 18th International Conference on Data Engineering. San Jose: IEEE, 2002: 431-440.
4 HE H, WANG H, YANG J, et al. Blinks: ranked keyword searches on graphs [C]// Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. Beijing: ACM, 2007: 305-316.
5 WANG Y, DAVIDSON A, PAN Y, et al. Gunrock: a high-performance graph processing library on the GPU [C]// Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Barcelona: ACM, 2016: 1-12.
6 PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the web[R].[s.l.]: Stanford InfoLab, 1999.
7 FLOYD R W Algorithm 97: shortest path[J]. Communications of the ACM, 1962, 5 (6): 345
8 HARISH P, NARAYANAN P J. Accelerating large graph algorithms on the GPU using CUDA [C]// International Conference on High-Performance Computing. Goa: Springer, 2007: 197-208.
9 BELLMAN R On a routing problem[J]. Quarterly of Applied Mathematics, 1958, 16 (1): 87- 90
doi: 10.1090/qam/102435
10 FORD JR L R. Network flow theory [R]. [s.l.]: Rand Corp, 1956.
11 YANG D, ZHANG D, YU Z, et al. Fine-grained preference-aware location search leveraging crowdsourced digital footprints from LBSNs [C]// Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing. Zurich: ACM, 2013: 479-488.
12 YANG D, ZHANG D, YU Z, et al. A sentiment-enhanced personalized location recommendation system [C]// Proceedings of the 24th ACM Conference on Hypertext and Social Media. Paris: ACM, 2013: 119-128.
13 LEY M DBLP: some lessons learned[J]. Proceedings of the VLDB Endowment, 2009, 2 (2): 1493- 1500
doi: 10.14778/1687553.1687577
14 BAST H, BÄURLE F, BUCHHOLD B, et al. Easy access to the freebase dataset [C]// Proceedings of the 23rd International Conference on World Wide Web. Seoul: ACM, 2014: 95-98.
15 AGGARWAL C C, WANG H X. Managing and mining graph data [M]. New York: Springer, 2010.
16 AGRAWAL S, CHAUDHURI S, DAS G. DBXplorer: a system for keyword-based search over relational databases [C]// Proceedings of 18th International Conference on Data Engineering. San Jose: IEEE, 2002: 5-16.
17 DOSSO D. A keyword search and citation system for RDF graphs [C]// FDIA. Milan: [s.n.], 2019: 23-28.
18 GHANBARPOUR A, NADERI H Survey on ranking functions in keyword search over graph-structured data[J]. Journal of Universal Computer Science, 2019, 25 (4): 361- 389
19 YANG Y, AGRAWAL D, JAGADISH H V, et al. An efficient parallel keyword search engine on knowledge graphs [C]// 2019 IEEE 35th International Conference on Data Engineering. Macao: IEEE, 2019: 338-349.
20 XIAO T, HAN D, HE J, et al Multi-keyword ranked search based on mapping set matching in cloud ciphertext storage system[J]. Connection Science, 2021, 33 (1): 95- 112
doi: 10.1080/09540091.2020.1753175
21 HU X, DUAN J, DANG D Natural language question answering over knowledge graph: the marriage of SPARQL query and keyword search[J]. Knowledge and Information Systems, 2021, 63 (4): 819- 844
doi: 10.1007/s10115-020-01534-4
22 FUNG J, TANG F, MANN S. Mediated reality using computer graphics hardware for computer vision [C]// Proceedings of the 6th International Symposium on Wearable Computers. Seattle: IEEE, 2002: 83-89.
23 FUNG J, MANN S. Computer vision signal processing on graphics processing units [C]// 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal: IEEE, 2004, 5: V-93.
24 CHITTY D M. A data parallel approach to genetic programming using programmable graphics hardware [C]// Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. London: ACM, 2007: 1566-1573.
25 SOMAN J, KISHORE K, NARAYANAN P J. A fast GPU algorithm for graph connectivity [C]// 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum. Atlanta: IEEE, 2010: 1-8.
26 VINEET V, NARAYANAN P J. CUDA cuts: fast graph cuts on the GPU [C]// 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Anchorage: IEEE, 2008: 1-8.
27 HAWICK K A, LEIST A, PLAYNE D P Parallel graph component labelling with GPUs and CUDA[J]. Parallel Computing, 2010, 36 (12): 655- 678
doi: 10.1016/j.parco.2010.07.002
28 OSAMA M, TRUONG M, YANG C, et al. Graph coloring on the GPU [C]// 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Rio de Janeiro: IEEE, 2019: 231-240.
29 AWAD M A, ASHKIANI S, PORUMBESCU S D, et al. Dynamic graphs on the GPU [C]// 2020 IEEE International Parallel and Distributed Processing Symposium. New Orleans: IEEE, 2020: 739-748.
30 ZHOU C, ZHANG T High performance graph data imputation on multiple GPUs[J]. Future Internet, 2021, 13 (2): 36
doi: 10.3390/fi13020036
31 MIN S W, WU K, HUANG S, et al. PyTorch-direct: enabling GPU centric data access for very large graph neural network training with irregular accesses[EB/OL]. [2021-06-01]. https://arxiv.org/abs/2101.07956.
32 KHORASANI F, VORA K, GUPTA R, et al. CuSha: vertex-centric graph processing on GPUs [C]// Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Vancouver BC: ACM, 2014: 239-252.
33 ZHU H, HE L, FU S, et al Wolfpath: accelerating iterative traversing-based graph processing algorithms on GPU[J]. International Journal of Parallel Programming, 2019, 47 (4): 644- 667
doi: 10.1007/s10766-017-0533-y
[1] 董思含,信俊昌,郝琨,姚钟铭,陈金义. 多区块链环境下的连接查询优化算法[J]. 浙江大学学报(工学版), 2022, 56(2): 313-321.
[2] 戴天伦,李博涵,臧亚磊,戴华,于自强,陈钢. PORP:面向无人驾驶的路径规划并行优化策略[J]. 浙江大学学报(工学版), 2022, 56(2): 329-337.
[3] 龙立,郑山锁,周炎,贺金川,孟宏立,蔡永龙. 基于拟蒙特卡罗方法的供水管网抗震可靠性分析并行化研究[J]. 浙江大学学报(工学版), 2020, 54(2): 241-247.
[4] 周佳立, 陈以军, 武敏. 基于FPGA监听的图像采集与预处理方法[J]. 浙江大学学报(工学版), 2018, 52(2): 398-405.
[5] 常超, 刘克胜, 谭龙丹, 贾文超. 基于图模型的C程序数据流分析[J]. 浙江大学学报(工学版), 2017, 51(5): 1007-1015.
[6] 季长清,余胜,王宝凤,陶帅,汪祖民,王润方. 移动云计算环境下的双色反近邻查询算法[J]. 浙江大学学报(工学版), 2016, 50(7): 1330-1337.
[7] 蔡青林,陈岭,梅寒蕾,孙建伶. 基于两级过滤的时间序列近似查询[J]. 浙江大学学报(工学版), 2016, 50(7): 1290-1297.
[8] 卢帅兵,庞建民,单征,岳峰. 基于QEMU的跨平台静态二进制翻译系统[J]. 浙江大学学报(工学版), 2016, 50(1): 158-165.
[9] 胡德超, 池龙哲, 杨琼, 王敏. 水库坝区冲刷漏斗的形成机理[J]. 浙江大学学报(工学版), 2015, 49(2): 257-264.
[10] 王骕,胡浩基,于慧敏,DAMPER R I. 基于数字水印的人脸与声纹融合识别算法[J]. 浙江大学学报(工学版), 2015, 49(1): 6-14.
[11] 金苍宏,吴明晖,应晶. 一种基于上下文索引的文本匹配框架[J]. J4, 2013, 47(9): 1537-1546.
[12] 陈楠,寿黎但,陈刚,陈珂,胡天磊. 面向动态环境的移动对象自适应索引方法[J]. J4, 2013, 47(3): 442-448.
[13] 吴羽,寿黎但,陈刚. CB-LSH:基于压缩位图的高性能LSH索引算法[J]. J4, 2012, 46(3): 377-385.
[14] 江锦华,吴羽,胡天磊,陈刚. 基于路径连接的XML复杂小枝模式查询处理[J]. J4, 2011, 45(1): 1-8.
[15] 周佳庆, 吴羽, 江锦华, 陈刚,董轶. 实时垂直搜索引擎对象缓存优化策略[J]. J4, 2011, 45(1): 14-19.