SQL generation from natural language queries with complex calculations on financial data

doi:10.3785/j.issn.1008-973X.2023.02.008

Journal of ZheJiang University (Engineering Science)

2023, Vol. 57

Issue (2): 277-286 DOI: 10.3785/j.issn.1008-973X.2023.02.008

SQL generation from natural language queries with complex calculations on financial data

Jia-hao HE(

),Xi-ping LIU*(

),Qing SHU,Chang-xuan WAN,De-xi LIU,Guo-qiong LIAO

School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013, China

Download:

HTML

PDF(739KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

The problem of structured query language (SQL) generation from natural language queries (Text-to-SQL) in financial domain was investigated. First, SOFT, a Text-to-SQL dataset in the financial domain was constructed. The dataset covered common queries in the financial domain with distinctive features and presented challenges to Text-to-SQL research. Then, FinSQL, a Text-to-SQL model, which optimized the support for complex queries in the financial domain, was proposed. In particular, by analyzing the characteristics of row calculation queries, a class of queries with complex numerical calculations, a divide-and-conquer based method was proposed. A row calculation query was divided into several subqueries, the SQL statement for each subquery was generated, and the SQL statements were finally combined into together to get the SQL statement for the original query. Experimental results on SOFT dataset show that the proposed FinSQL model outperforms existing methods for the hard queries, and performs well for row calculation queries.

Key words： Text-to-SQL natural language query financial field row calculation query divide-and-conquer method

Received: 31 July 2022 Published: 02 December 2022

CLC:

TP 391

Fund: 国家自然科学基金资助项目(62076112, 61972184); 江西省自然科学基金资助项目(20192BAB207017); 江西省教育厅科学技术研究资助项目(GJJ190255); 江西省研究生创新专项资金项目(YC2021-B130)

Corresponding Authors: Xi-ping LIU E-mail: hejiahao810@126.com;liuxiping@jxufe.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Jia-hao HE
	Xi-ping LIU
	Qing SHU
	Chang-xuan WAN
	De-xi LIU
	Guo-qiong LIAO

Cite this article:

Jia-hao HE,Xi-ping LIU,Qing SHU,Chang-xuan WAN,De-xi LIU,Guo-qiong LIAO. SQL generation from natural language queries with complex calculations on financial data. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 277-286.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.02.008 OR https://www.zjujournals.com/eng/Y2023/V57/I2/277

带复杂计算的金融领域自然语言查询的SQL生成

研究金融领域基于自然语言查询的结构化查询语言(SQL)生成问题(Text-to-SQL), 构建一个金融领域Text-to-SQL数据集，称为SOFT数据集. 该数据集覆盖了金融领域的常见查询，具有鲜明的特点，并对Text-to-SQL提出了挑战. 提出金融领域Text-to-SQL模型FinSQL，该模型优化了对金融领域复杂查询的支持. 通过分析一类复杂计算查询（行计算查询）的特点，提出一种基于分治的方法，即先将一个行计算查询分解为若干个子查询，分别针对每个子查询生成SQL语句，再将子查询的SQL语句组合在一起得到原始查询的SQL语句. 在SOFT数据集上进行验证，结果显示，本研究所提的方法在复杂查询上效果优于已有方法. 特别地，所提出的模型FinSQL能够较好地支持行计算查询.

关键词： Text-to-SQL, 自然语言查询, 金融领域, 行计算查询, 分治方法

Tab.1 SOFT dataset example

Tab.2 Proportions of query types for each dataset

Tab.3 Exact-match accuracy on SOFT dataset

Tab.4 Comparison of model prediction results for some samples

Fig.1 Overall structure of FinSQL

Tab.5 Possible slot values for each component

Tab.6 Question types and combination rules

Tab.7 Problem identification experimental results

Tab.8 Comparison on problem decomposition results

Tab.9 Comparison on Text-to-SQL model experimental results

Tab.10 Experimental results of FinSQL on SOFT-rc

Tab.11 Comparison of prediction results on some datasets

Fig.2 Comparison on Text-to-SQL model response time


[1]	曹金超, 黄滔, 陈刚, 等自然语言生成多表SQL查询语句技术研究[J]. 计算机科学与探索, 2020, 14 (7): 1133- 1141 CAO Jin-chao, HUANG Tao, CHEN Gang, et al Research on technology of generating multi-table SQL query statement by natural language[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14 (7): 1133- 1141 doi: 10.3778/j.issn.1673-9418.1908025

[2]	潘璇, 徐思涵, 蔡祥睿, 等基于深度学习的数据库自然语言接口综述[J]. 计算机研究与发展, 2021, 58 (9): 1925- 1950 PAN Xuan, XU Si-han, CAI Xiang-rui, et al Survey on deep learning based natural language interface to database[J]. Journal of Computer Research and Development, 2021, 58 (9): 1925- 1950 doi: 10.7544/issn1000-1239.2021.20200209

[3]	ZHONG V, XIANG C, SOCHER R. Seq2SQL: generating structured queries from natural language using reinforcement learning [EB/OL]. (2017-11-20). http://arxiv.org/abs/1709.00103.

[4]	YU T, ZHANG R, YANG K, et al. Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and Text-To-SQL task [EB/OL]. (2019-02-02). http://arxiv.org/abs/1809.08887.

[5]	百度NLP. 语义解析(Text-to-SQL)技术研究及应用 [EB/OL]. [2022-07-01]. https://zhuanlan.zhihu.com/p/269478469.

[6]	BAIK C, JAGADISH H V, LI Y. Bridging the semantic gap with SQL query logs in natural language interfaces to databases [C]// Proceedings of ICDE 2019. Macao: IEEE, 2019: 374–385.

[7]	LI F, JAGADISH H V Constructing an interactive natural language interface for relational databases[J]. Proceedings of the VLDB Endowment, 2014, 8 (1): 73- 84 doi: 10.14778/2735461.2735468

[8]	SONG D, SCHILDER F, SMILEY C, et al. TR discover: a natural language interface for querying and analyzing interlinked datasets [C] // Proceedings of ISWC 2015. Bethlehem: Springer, 2015: 21–37.

[9]	YAGHMAZADEH N, WANG Y, DILLIG I, et al. SQLizer: query synthesis from natural language [C]// Proceedings of the ACM on Programming Languages. New York: ACM. 2017: 1-26.

[10]	BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [EB/OL]. (2016-05-19). http://arxiv.org/abs/1409.0473.

[11]	XU X, LIU C, SONG D. SQLNet: generating structured queries from natural language without reinforcement learning [EB/OL]. (2017-11-20). https://arxiv.org/abs/1711.04436.

[12]	YU T, YASUNAGA M, YANG K, et al. SyntaxSQLNet: syntax tree networks for complex and cross-domain Text-to-SQL task [C]// Proceedings of EMNLP 2018. Brussels: ACL, 2018: 1653–1663.

[13]	LEI W, WANG W, MA Z, et al. Re-examining the role of schema linking in Text-to-SQL [C]// Proceedings of EMNLP 2020. [s.l.]: ACL, 2020: 6943–6954.

[14]	CHOI D, SHIN M C, KIM E, et al RYANSQL: recursively applying sketch-based slot fillings for complex Text-to-SQL in cross-domain databases[J]. Computational Linguistics, 2021, 47 (2): 309- 332

[15]	HEMPHILL C T, GODFREY J J, DODDINGTON G R. The ATIS spoken language systems pilot corpus [EB/OL]. (2020-11-20). https://aclanthology.org/H90-1021.pdf.

[16]	ZELLE J M, MOONEY R J. Learning to parse database queries using inductive logic programming [C]// Proceedings of AAAI 1996. Portland Oregon: AAAI Press, 1996: 1050–1055.

[17]	MIN Q, SHI Y, ZHANG Y. A pilot study for chinese SQL semantic parsing [EB/OL]. (2019-10-16). http://arxiv.org/abs/1909.13293.

[18]	SUN N, YANG X, LIU Y. TableQA: a large-scale chinese Text-to-SQL dataset for table-aware SQL generation [EB/OL]. (2020-06-10). https://arxiv.org/abs/2006.06434.

[19]	LIANG C, YU Y, JIANG H, et al. BOND: BERT-assisted open-domain named entity recognition with distant supervision [C]// Proceedings of ACM SIGKDD 2020. [s.l.]: ACM, 2020: 1054–1064.

[20]	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of NAACL 2019. Minneapolis: ACL, 2019: 4171–4186.

[21]	JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification [C]// Proceedings of EACL 2017. Valencia: ACL, 2017: 427–431.

[22]	LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning [C]// Proceedings of IJCAI 2016. New York: AAAI Press, 2016: 2873–2879.

[1]	Tian-qi ZHOU,Yan YANG,Ji-jie ZHANG,Shao-wei YIN,Zeng-qiang GUO. Graph contrastive learning based on negative-sample-free loss and adaptive augmentation[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 259-266.

[2]	Fan YANG,Bo NING,Huai-qing LI,Xin ZHOU,Guan-yu LI. Multimodal image retrieval model based on semantic-enhanced feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 252-258.

[3]	Feng-long SU,Ning JING. Temporal knowledge graph representation learning based on relational aggregation[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 235-242.

[4]	Chang-chun YANG,Zan-ting YE,Ban-teng LIU,Ke WANG,Hai-dong CUI. Medical image segmentation method based on multi-source information fusion[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 226-234.

[5]	Hua HUANG,Qiu-ge ZHAO,Zai-xing HE,Jia-ran LI. Contour error control of two-axis system based on LSTM and Newton iteration[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 10-20.

[6]	Chen YE,Hong-fei ZHAN,Ying-jun LIN,Jun-he YU,Rui WANG,Wu-chang ZHONG. Design knowledge recommendation based on inference-context-aware activation model[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 32-46.

[7]	Li-zhou FENG,Yang YANG,You-wei WANG,Gui-jun YANG. New method for news recommendation based on Transformer and knowledge graph[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 133-143.

[8]	Jun-chi MA,Xiao-xin DI,Zong-tao DUAN,Lei TANG. Survey on program representation learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 155-169.

[9]	Tian-le YUAN,Ju-long YUAN,Yong-jian ZHU,Han-chen ZHENG. Surface defect detection algorithm of thrust ball bearing based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2349-2357.

[10]	Qiang GUO,Tian-hao WU,Wei XU,Mykola KALIUZHNY. Target tracking algorithm based on channel reliability and aberrance repression[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2379-2391.

[11]	Nan-jing YU,Xiao-biao FAN,Tian-min DENG,Guo-tao MAO. Ship detection algorithm in complex backgrounds via multi-head self-attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2392-2402.

[12]	Yu XIE,Zi-qun BAO,Na ZHANG,Biao WU,Xiao-mei TU,Xiao-an BAO. Object detection algorithm based on feature enhancement and deep fusion[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2403-2415.

[13]	Jia-wei LU,Jia-hong ZHENG,Duan-ni LI,Jun XU,Gang XIAO. Short text optimized topic model for service clustering[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2416-2425.

[14]	Qiao-hong CHEN,Fei-yu LI,Qi SUN,Yu-bo JIA. Answer selection model based on LSTM and decay self-attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2436-2444.

[15]	Fei LI,Kun HU,Yong ZHANG,Wen-shan WANG,Hao JIANG. Multi-dimensional detection of longitudinal tearing of conveyor belt based on YOLOv4 of hybrid domain attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2156-2167.

Viewed

Full text

Abstract

Cited

Shared

Discussed