Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (2): 288-296    DOI: 10.3785/j.issn.1008-973X.2022.02.009
计算机与控制工程     
数据库外基于多模型的学习式查询优化方法
李广龙(),申德荣*(),聂铁铮,寇月
东北大学 计算机科学与工程学院,辽宁 沈阳 110169
Learning query optimization method based on multi model outside database
Guang-long LI(),De-rong SHEN*(),Tie-zheng NIE,Yue KOU
School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
 全文: PDF(881 KB)   HTML
摘要:

对于AI与数据库优化问题,现有技术均须改动数据库底层,影响研究成果的应用且缺乏可扩展性. 提出一种非嵌入数据库的学习式查询优化方法. 在基数估计阶段,使用多模型的方法,对特定的子查询建立神经网络,独立训练不同的子模型,解决需要训练集过多且可扩展性差的问题;在连接优化阶段,应用基于代价的强化学习方法,提高查询优化性能. 针对每个查询,从基数估计到连接排序的优化过程都在数据库外执行,按照得到的优化策略对查询重写,并将重写结果返回到数据库中,通过设置参数使该查询按照指定的计划执行. 在包含8个表的数据集上进行实验验证,与未进行优化的查询进行比较,非嵌入数据库的优化方法具有良好的优化效果.

关键词: 查询优化基数估计连接排序神经网络强化学习    
Abstract:

For AI and database optimization problems, existing technologies need to change the bottom layer of database, which affects the application of research results and lacks scalability. A learning query optimization method for non-embedded database was proposed. In the cardinality estimation stage, the multi-model method is used to establish a neural network for specific sub queries and train different sub models independently, which solves the problem of too many training sets and poor scalability. In the join optimization stage, cost-based reinforcement learning is applied to improve the query optimization performance. For each query, the optimization processes from cardinality estimation to connection sorting are executed outside the database. The query is rewritten according to the obtained optimization strategy, and the rewriting results are returned to the database. The query is executed according to the specified plan by setting parameters. Experimental verification was carried out on the data set containing eight tables. Compared with the query not optimized, the optimization method of non-embedded database has good optimization effect.

Key words: query optimization    cardinality estimation    join order    neural network    reinforcement learning
收稿日期: 2021-07-19 出版日期: 2022-03-03
CLC:  TP 3  
基金资助: 国家自然科学基金资助项目(62172082 , 62072084, 62072086)
通讯作者: 申德荣     E-mail: liguanglong1996@163.com;shenderong@cse.neu.edu.cn
作者简介: 李广龙(1996—),男,硕士生,从事数据库相关研究. oricid.org/0000-0002-8201-5729. E-mail: liguanglong1996@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
李广龙
申德荣
聂铁铮
寇月

引用本文:

李广龙,申德荣,聂铁铮,寇月. 数据库外基于多模型的学习式查询优化方法[J]. 浙江大学学报(工学版), 2022, 56(2): 288-296.

Guang-long LI,De-rong SHEN,Tie-zheng NIE,Yue KOU. Learning query optimization method based on multi model outside database. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 288-296.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.02.009        https://www.zjujournals.com/eng/CN/Y2022/V56/I2/288

图 1  查询处理流程
图 2  数据库外的优化框架
图 3  基于神经网络的基数估计
图 4  表的子关联图示例
图 5  基于GCM和MCM的编码对比
表名 行数 可连接的表的序号 序号
lineitem 6001215 ②, ⑥
orders 1500000 ①, ③
customer 150000 ②, ④
nation 25 ③, ⑧
Supplier 10000
partsupp 800000 ⑤, ⑦
part 200000
region 5
表 1  基数估计与连接优化实验的表
关联图包含的表 Pgsql偏差值 MCM偏差值 比率
表① 29049 27207 1.06
表② 3583 3253 1.10
表①, ② 315276 95562 3.30
表①, ⑥ 1031819 181906 5.70
表①, ②, ⑥ 1513111 172456 8.80
表①, ②, ③ 1724555 203435 8.50
表①, ②, ③, ⑥ 2323443 249033 9.30
表 2  基数估计值准确度对比
图 6  2个表的情况下训练集对基数估计的影响
图 7  4个表的情况下训练集对基数估计的影响
实验设置 tu/s to/s
模式1-3joins 35.1 34.3
模式1-4joins 95.5 75.4
模式1-5joins 105.7 92.0
模式2-3joins 41.1 38.3
模式2-4joins 100.2 88.6
模式2-5joins 111.0 99.8
表 3  未优化与优化后的查询执行效率对比
1 ARMSTRONG K Big data: a revolution that will transform how we live, work, and think[J]. Mathematics and Computer Education, 2014, 47 (10): 181- 183
2 李国良, 周煊赫, 孙佶, 等 基于机器学习的数据库技术综述[J]. 计算机学报, 2020, 43 (11): 2019- 2049
LI Guo-liang, ZHOU Xuan-he, SUN Ji, et al A survey of machine learning based database techniques[J]. Chinese Journal of Computers, 2020, 43 (11): 2019- 2049
3 DURAND M, FLAJOLET P. Loglog counting of large cardinalities [C]// 11th Annual European Symposium on Algorithms. Budapest: Springer, 2003.
4 FLAJOLET P, FUSY É, GANDOUET O, et al. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm [C]// Discrete Mathematics and Theoretical Computer Science. Nancy: [s.n.], 2007: 137-156.
5 WU C, JINDAL A, AMIZADEH S, et al Towards a learning optimizer for shared clouds[J]. Proceedings of the VLDB Endowment, 2018, 12 (3): 210- 222
doi: 10.14778/3291264.3291267
6 LAKSHMI M S , ZHOU S. Selectivity estimation in extensible databases: a neural network approach [C]// Proceedings of 24th International Conference on Very Large Data Bases. New York: DBLP, 1998.
7 DUTT A, WANG C, NAZI A, et al Selectivity estimation for range predicates using lightweight models[J]. Proceedings of the VLDB Endowment, 2019, 12 (9): 1044- 1057
doi: 10.14778/3329772.3329780
8 HASAN S, THIRUMURUGANATHAN S, AUGUSTINE J, et al. Multi-attribute selectivity estimation using deep learning [EB/OL]. (2019-06-17). https://arxiv.org/abs/1903.09999.
9 KIPF A, KIPF T, RADKE B, et al. Learned cardinalities: estimating correlated joins with deep learning [EB/OL]. (2018-12-08). https://arxiv.org/abs/1809.00677.
10 SUN, LI G An end-to-end learning-based cost estimator[J]. Proceedings of the VLDB Endowment, 2019, 13 (3): 307- 319
doi: 10.14778/3368289.3368296
11 SELINGER P G, ASTRAHAN M M, CHAMBERLIN D D, et al. Access path selection in a relational database management system [C]// International Conference on Management of Data (SIGMOD). Boston: ACM, 1979: 23-34.
12 VANCE B, MAIER D. Rapid bushy join-order optimization with cartesian products [C]// International Conference on Management of Data. Montreal: ACM, 1996:35-46.
13 WAAS F, PELLENKOFT A. Join order selection (good enough is easy) [C]// British National Conference on Databases: Advances in Databases. Exeter: Springer, 2000: 51-67.
14 NEUMANN T. Query simplification: graceful degradation for join-order optimization [C]// International Conference on Management of Data (SIGMOD). Providence: ACM, 2009: 403-414.
15 KRISHNAN S, YANG Z, GOLDBERG K, et al. Deep reinforcement learning for join order enumeration [EB/OL]. (2018-02-28). https://arxiv.org/abs/1808.03196.
16 RYAN M, OLGA P, et al. Learned cardinalities: estimating correlated joins with deep learning [EB/OL]. (2018-12-08). https://arxiv.org/abs/1803.00055.
17 TRUMMER I, WANG J, MARAM D, et al. Skinnerdb: regret-bounded query evaluation via reinforcement learning[EB/OL]. (2019-01-16). https://arxiv.org/abs/1901.05152.
18 LEIS V, GUBICHEV A, MIRCHEV A, et al How good are query optimizers, really?[J]. Proceedings of the VLDB Endowment, 2015, 9 (3): 204- 215
doi: 10.14778/2850583.2850594
[1] 徐小高,夏莹杰,朱思雨,邝砾. 基于强化学习的多路口可变车道协同控制方法[J]. 浙江大学学报(工学版), 2022, 56(5): 987-994, 1005.
[2] 何立,庞善民. 结合年龄监督和人脸先验的语音-人脸图像重建[J]. 浙江大学学报(工学版), 2022, 56(5): 1006-1016.
[3] 王云灏,孙铭会,辛毅,张博宣. 基于压电薄膜传感器的机器人触觉识别系统[J]. 浙江大学学报(工学版), 2022, 56(4): 702-710.
[4] 吴泽康,赵姗,李宏伟,姜懿芮. 遥感图像语义分割空间全局上下文信息网络[J]. 浙江大学学报(工学版), 2022, 56(4): 795-802.
[5] 陈扬钊,袁伟娜. 深度学习辅助上行免调度NOMA多用户检测方法[J]. 浙江大学学报(工学版), 2022, 56(4): 816-822.
[6] 张科文,潘柏松. 考虑非线性模型不确定性的航天器自主交会控制[J]. 浙江大学学报(工学版), 2022, 56(4): 833-842.
[7] 程若然,赵晓丽,周浩军,叶翰辰. 基于深度学习的中文字体风格转换研究综述[J]. 浙江大学学报(工学版), 2022, 56(3): 510-519, 530.
[8] 王婷,朱小飞,唐顾. 基于知识增强的图卷积神经网络的文本分类[J]. 浙江大学学报(工学版), 2022, 56(2): 322-328.
[9] 温佩芝,陈君谋,肖雁南,温雅媛,黄文明. 基于生成式对抗网络和多级小波包卷积网络的水下图像增强算法[J]. 浙江大学学报(工学版), 2022, 56(2): 213-224.
[10] 任松,朱倩雯,涂歆玥,邓超,王小书. 基于深度学习的公路隧道衬砌病害识别方法[J]. 浙江大学学报(工学版), 2022, 56(1): 92-99.
[11] 黄发明,潘李含,姚池,周创兵,姜清辉,常志璐. 基于半监督机器学习的滑坡易发性预测建模[J]. 浙江大学学报(工学版), 2021, 55(9): 1705-1713.
[12] 张楠,董红召,佘翊妮. 公交专用道条件下公交车辆轨迹的Seq2Seq预测[J]. 浙江大学学报(工学版), 2021, 55(8): 1482-1489.
[13] 王飞,徐维祥. 基于LSTM神经网络改进的路阻函数模型[J]. 浙江大学学报(工学版), 2021, 55(6): 1065-1071.
[14] 郝晏,丁雅斌,付津昇. 机器人加工系统累积误差逐级闭环优化策略[J]. 浙江大学学报(工学版), 2021, 55(6): 1142-1149.
[15] 马一凡,赵凡宇,王鑫,金仲和. 密集观测场景下的敏捷成像卫星任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(6): 1215-1224.