Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (2): 288-296    DOI: 10.3785/j.issn.1008-973X.2022.02.009
    
Learning query optimization method based on multi model outside database
Guang-long LI(),De-rong SHEN*(),Tie-zheng NIE,Yue KOU
School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
Download: HTML     PDF(881KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

For AI and database optimization problems, existing technologies need to change the bottom layer of database, which affects the application of research results and lacks scalability. A learning query optimization method for non-embedded database was proposed. In the cardinality estimation stage, the multi-model method is used to establish a neural network for specific sub queries and train different sub models independently, which solves the problem of too many training sets and poor scalability. In the join optimization stage, cost-based reinforcement learning is applied to improve the query optimization performance. For each query, the optimization processes from cardinality estimation to connection sorting are executed outside the database. The query is rewritten according to the obtained optimization strategy, and the rewriting results are returned to the database. The query is executed according to the specified plan by setting parameters. Experimental verification was carried out on the data set containing eight tables. Compared with the query not optimized, the optimization method of non-embedded database has good optimization effect.



Key wordsquery optimization      cardinality estimation      join order      neural network      reinforcement learning     
Received: 19 July 2021      Published: 03 March 2022
CLC:  TP 3  
Corresponding Authors: De-rong SHEN     E-mail: liguanglong1996@163.com;shenderong@cse.neu.edu.cn
Cite this article:

Guang-long LI,De-rong SHEN,Tie-zheng NIE,Yue KOU. Learning query optimization method based on multi model outside database. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 288-296.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.02.009     OR     https://www.zjujournals.com/eng/Y2022/V56/I2/288


数据库外基于多模型的学习式查询优化方法

对于AI与数据库优化问题,现有技术均须改动数据库底层,影响研究成果的应用且缺乏可扩展性. 提出一种非嵌入数据库的学习式查询优化方法. 在基数估计阶段,使用多模型的方法,对特定的子查询建立神经网络,独立训练不同的子模型,解决需要训练集过多且可扩展性差的问题;在连接优化阶段,应用基于代价的强化学习方法,提高查询优化性能. 针对每个查询,从基数估计到连接排序的优化过程都在数据库外执行,按照得到的优化策略对查询重写,并将重写结果返回到数据库中,通过设置参数使该查询按照指定的计划执行. 在包含8个表的数据集上进行实验验证,与未进行优化的查询进行比较,非嵌入数据库的优化方法具有良好的优化效果.


关键词: 查询优化,  基数估计,  连接排序,  神经网络,  强化学习 
Fig.1 Query processing flow
Fig.2 Optimization framework outside database
Fig.3 Cardinality estimation based on neural networks
Fig.4 Example of sub association graph for tables
Fig.5 Coding comparison based on GCM and MCM
表名 行数 可连接的表的序号 序号
lineitem 6001215 ②, ⑥
orders 1500000 ①, ③
customer 150000 ②, ④
nation 25 ③, ⑧
Supplier 10000
partsupp 800000 ⑤, ⑦
part 200000
region 5
Tab.1 Tables of cardinality estimation and join optimization experiments
关联图包含的表 Pgsql偏差值 MCM偏差值 比率
表① 29049 27207 1.06
表② 3583 3253 1.10
表①, ② 315276 95562 3.30
表①, ⑥ 1031819 181906 5.70
表①, ②, ⑥ 1513111 172456 8.80
表①, ②, ③ 1724555 203435 8.50
表①, ②, ③, ⑥ 2323443 249033 9.30
Tab.2 Accuracy comparison of cardinality estimation
Fig.6 Effect of training set on cardinality estimation in case of two tables
Fig.7 Effect of training set on cardinality estimation in case of four tables
实验设置 tu/s to/s
模式1-3joins 35.1 34.3
模式1-4joins 95.5 75.4
模式1-5joins 105.7 92.0
模式2-3joins 41.1 38.3
模式2-4joins 100.2 88.6
模式2-5joins 111.0 99.8
Tab.3 Comparison of query execution efficiency between non-optimized and optimized queries
[1]   ARMSTRONG K Big data: a revolution that will transform how we live, work, and think[J]. Mathematics and Computer Education, 2014, 47 (10): 181- 183
[2]   李国良, 周煊赫, 孙佶, 等 基于机器学习的数据库技术综述[J]. 计算机学报, 2020, 43 (11): 2019- 2049
LI Guo-liang, ZHOU Xuan-he, SUN Ji, et al A survey of machine learning based database techniques[J]. Chinese Journal of Computers, 2020, 43 (11): 2019- 2049
[3]   DURAND M, FLAJOLET P. Loglog counting of large cardinalities [C]// 11th Annual European Symposium on Algorithms. Budapest: Springer, 2003.
[4]   FLAJOLET P, FUSY É, GANDOUET O, et al. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm [C]// Discrete Mathematics and Theoretical Computer Science. Nancy: [s.n.], 2007: 137-156.
[5]   WU C, JINDAL A, AMIZADEH S, et al Towards a learning optimizer for shared clouds[J]. Proceedings of the VLDB Endowment, 2018, 12 (3): 210- 222
doi: 10.14778/3291264.3291267
[6]   LAKSHMI M S , ZHOU S. Selectivity estimation in extensible databases: a neural network approach [C]// Proceedings of 24th International Conference on Very Large Data Bases. New York: DBLP, 1998.
[7]   DUTT A, WANG C, NAZI A, et al Selectivity estimation for range predicates using lightweight models[J]. Proceedings of the VLDB Endowment, 2019, 12 (9): 1044- 1057
doi: 10.14778/3329772.3329780
[8]   HASAN S, THIRUMURUGANATHAN S, AUGUSTINE J, et al. Multi-attribute selectivity estimation using deep learning [EB/OL]. (2019-06-17). https://arxiv.org/abs/1903.09999.
[9]   KIPF A, KIPF T, RADKE B, et al. Learned cardinalities: estimating correlated joins with deep learning [EB/OL]. (2018-12-08). https://arxiv.org/abs/1809.00677.
[10]   SUN, LI G An end-to-end learning-based cost estimator[J]. Proceedings of the VLDB Endowment, 2019, 13 (3): 307- 319
doi: 10.14778/3368289.3368296
[11]   SELINGER P G, ASTRAHAN M M, CHAMBERLIN D D, et al. Access path selection in a relational database management system [C]// International Conference on Management of Data (SIGMOD). Boston: ACM, 1979: 23-34.
[12]   VANCE B, MAIER D. Rapid bushy join-order optimization with cartesian products [C]// International Conference on Management of Data. Montreal: ACM, 1996:35-46.
[13]   WAAS F, PELLENKOFT A. Join order selection (good enough is easy) [C]// British National Conference on Databases: Advances in Databases. Exeter: Springer, 2000: 51-67.
[14]   NEUMANN T. Query simplification: graceful degradation for join-order optimization [C]// International Conference on Management of Data (SIGMOD). Providence: ACM, 2009: 403-414.
[15]   KRISHNAN S, YANG Z, GOLDBERG K, et al. Deep reinforcement learning for join order enumeration [EB/OL]. (2018-02-28). https://arxiv.org/abs/1808.03196.
[16]   RYAN M, OLGA P, et al. Learned cardinalities: estimating correlated joins with deep learning [EB/OL]. (2018-12-08). https://arxiv.org/abs/1803.00055.
[17]   TRUMMER I, WANG J, MARAM D, et al. Skinnerdb: regret-bounded query evaluation via reinforcement learning[EB/OL]. (2019-01-16). https://arxiv.org/abs/1901.05152.
[18]   LEIS V, GUBICHEV A, MIRCHEV A, et al How good are query optimizers, really?[J]. Proceedings of the VLDB Endowment, 2015, 9 (3): 204- 215
doi: 10.14778/2850583.2850594
[1] Xiao-gao XU,Ying-jie XIA,Si-yu ZHU,Li KUANG. Cooperative control algorithm of multi-intersection variable-direction lanes based on reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 987-994, 1005.
[2] Li HE,Shan-min PANG. Face reconstruction from voice based on age-supervised learning and face prior information[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 1006-1016.
[3] Yun-hao WANG,Ming-hui SUN,Yi XIN,Bo-xuan ZHANG. Robot tactile recognition system based on piezoelectric film sensor[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 702-710.
[4] Ze-kang WU,Shan ZHAO,Hong-wei LI,Yi-rui JIANG. Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 795-802.
[5] Yang-zhao CHEN,Wei-na YUAN. Deep learning aided multi-user detection for up-link grant-free NOMA[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 816-822.
[6] Ke-wen ZHANG,Bai-song PAN. Control design of spacecraft autonomous rendezvous using nonlinear models with uncertainty[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 833-842.
[7] Ruo-ran CHENG,Xiao-li ZHAO,Hao-jun ZHOU,Han-chen YE. Review of Chinese font style transfer research based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 510-519, 530.
[8] Ting WANG,Xiao-fei ZHU,Gu TANG. Knowledge-enhanced graph convolutional neural networks for text classification[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 322-328.
[9] Pei-zhi WEN,Jun-mou CHEN,Yan-nan XIAO,Ya-yuan WEN,Wen-ming HUANG. Underwater image enhancement algorithm based on GAN and multi-level wavelet CNN[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 213-224.
[10] Song REN,Qian-wen ZHU,Xin-yue TU,Chao DENG,Xiao-shu WANG. Lining disease identification of highway tunnel based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 92-99.
[11] Fa-ming HUANG,Li-han PAN,Chi YAO,Chuang-bing ZHOU,Qing-hui JIANG,Zhi-lu CHANG. Landslide susceptibility prediction modelling based on semi-supervised machine learning[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(9): 1705-1713.
[12] Nan ZHANG,Hong-zhao DONG,Yi-ni SHE. Seq2Seq prediction of bus trajectory on exclusive bus lanes[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1482-1489.
[13] Fei WANG,Wei-xiang XU. Improved model of road impedance function based on LSTM neural network[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1065-1071.
[14] Yan HAO,Ya-bin DING,Jin-sheng FU. Hierarchical closed-loop optimization strategy for cumulative error of robot machining system[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1142-1149.
[15] Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Agile imaging satellite task planning method for intensive observation[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1215-1224.