Please wait a minute...
浙江大学学报(理学版)  2023, Vol. 50 Issue (2): 204-212    DOI: 10.3785/j.issn.1008-9497.2023.02.010
地球科学     
基于LSM-OCTree的时空流分布式调度和存储方案
李悦艺1,2,张丰1,2(),杜震洪1,2,刘仁义1,2
1.浙江大学 浙江省资源与环境信息系统重点实验室,浙江 杭州 310028
2.浙江大学 地理信息科学研究所,浙江 杭州 310027
Distributed scheduling and storage scheme based on LSM-OCTree for spatiotemporal stream
Yueyi LI1,2,Feng ZHANG1,2(),Zhenhong DU1,2,Renyi LIU1,2
1.Zhejiang Provincial Key Lab of GIS,Zhejiang University,Hangzhou 310028,China
2.Department of Geographic Information Science,Zhejiang University,Hangzhou 310027,China
 全文: PDF(2545 KB)   HTML( 5 )
摘要:

时空流的高效管理要求顾及数据的时空相关性,支持时空流的高速插入、实时索引和低延迟时空范围查询,而现有的基于HBase等的存储方案,因索引更新开销过大,无法满足高效管理要求。针对时空流的应用特性,提出了一种面向时间分片的时空流分布式调度和存储方法。利用时空流的紧耦合性进行数据划分与调度,以减少查询时数据复制的开销。将采用预分区方式构建的基于日志结构合并树的八叉树(octree based on the log-structured merge-tree,LSM-OCTree)索引作为存储结构,保证时空流的索引更新,实现索引的高效批量合并计算,提高查询性能。实验结果表明,时空动态调度策略优于通用的调度方法,LSM-OCTree索引的合并与更新性能优于常规索引结构。与HBase方案相比,基于LSM-OCTree的时空流分布式调度和存储方案的查询效率提升了20%以上。

关键词: 时空流时空调度LSM-OCTree分布式存储    
Abstract:

Efficient management of spatiotemporal stream requires to take spatiotemporal correlation into account and support high-speed insertion, real-time indexing and low delay spatiotemporal range query. However, due to the high cost of index update, existing scheduling and storage schemes based on HBase can hardly meet those requirements. According to the application characteristics of spatiotemporal stream, a time-slicing oriented distributed scheduling and storage method is proposed. The tight coupling of spatiotemporal stream is used for data division and scheduling to reduce the overhead of data replication during query. To achieve both index update performance and query efficiency of spatiotemporal stream, octree based on the log-structured merge-tree (LSM-OCTree) is constructed by pre-partition as the storage structure. Efficient batch merging method is employed to improve the overall query performance. Experimental results show that the spatiotemporal dynamic scheduling strategy is better than the general scheduling method, and the merging and updating performance of LSM-OCTree index is better than that of conventional index structure. Compared with HBase scheme, the query performance of distributed storage scheme based on LSM-OCTree is over 20% better. In summary, the distributed scheduling and storage scheme considers both index update and range query performance, and has high efficiency.

Key words: spatiotemporal stream    spatiotemporal scheduling    LSM-OCTree    distributed storage
收稿日期: 2022-02-14 出版日期: 2023-03-21
CLC:  P 208  
基金资助: 国家自然科学基金资助项目(42271466);国家重点研发计划重点专项(2018YFB0505000);高分综合交通遥感应用示范系统(二期)(07-Y30B30-9001-19/21)
通讯作者: 张丰     E-mail: zfcarnation@zju.edu.cn
作者简介: 李悦艺(1997—),ORCID:https://orcid.org/0000-0001-9874-3491,女,硕士研究生,主要从事时空流管理研究.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
李悦艺
张丰
杜震洪
刘仁义

引用本文:

李悦艺,张丰,杜震洪,刘仁义. 基于LSM-OCTree的时空流分布式调度和存储方案[J]. 浙江大学学报(理学版), 2023, 50(2): 204-212.

Yueyi LI,Feng ZHANG,Zhenhong DU,Renyi LIU. Distributed scheduling and storage scheme based on LSM-OCTree for spatiotemporal stream. Journal of Zhejiang University (Science Edition), 2023, 50(2): 204-212.

链接本文:

https://www.zjujournals.com/sci/CN/10.3785/j.issn.1008-9497.2023.02.010        https://www.zjujournals.com/sci/CN/Y2023/V50/I2/204

图1  分布式时空调度
图2  LSM-OCTree结构
图3  LSM-OCTree合并方法
图4  面向时间分片的分布式LSM-OCTree构建
图5  实验数据空间分布
图6  3种分发方案的查询性能对比
图7  采用时空调度和常规调度的连续查询任务计算速率对比
图8  3种合并方式的性能对比
图9  LSM-OCTree与常规八叉树的更新效率对比
图10  3种存储方案的查询性能对比
1 KAZEMITABAR S J, DEMIRYUREK U, ALI M, et al. Geospatial stream query processing using Microsoft SQL Server Stream Insight[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 1537-1540. DOI:10.14778/1920841.1921032
doi: 10.14778/1920841.1921032
2 杨良怀, 沈东海, 范玉雷, 等. 面向时空流的移动对象空间索引构建[J]. 电子学报, 2021, 49(5): 992-1000. DOI:10.12263/DZXB.20200300
YANG L H, SHEN D H, FAN Y L, et al. A moving object spatial index for spatio-temporal data stream[J]. Acta Electronica Sinica, 2021, 49(5): 992-1000. DOI:10.12263/DZXB.20200300
doi: 10.12263/DZXB.20200300
3 CARBONE P, KATSIFODIMOS A, EWEN S, et al. Apache FlinkTM: Stream and batch processing in a single engine[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4): 28-38.
4 ZHANG F, ZHENG Y, XU D P, et al. Real-time spatial queries for moving objects using storm topology[J]. ISPRS International Journal of Geo-Information, 2016, 5(10): 178. DOI:10.3390/ijgi5100178
doi: 10.3390/ijgi5100178
5 SHAIKH S A, MARIAM K, KITAGAWA H, et al. GeoFlink: A distributed and scalable framework for the real-time processing of spatial streams[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: Association for Computing Machinery, 2020: 3149-3156. DOI:10.1145/3340531. 3412761
doi: 10.1145/3340531. 3412761
6 NAQVI S N Z, YFANTIDOU S, ZIMÁNYI E. Time Series Databases and InfluxDB[M]. Studienarbeit:Université Libre de Bruxelles, 2017.
7 ZHANG T, YANG L H, SHEN D H, et al. An efficient in-memory R-tree construction scheme for spatio-temporal data stream[C]// International Conference on Service-Oriented Computing. Cham: Springer, 2018: 253-265. DOI:10.1007/978-3-030-17642-6_22
doi: 10.1007/978-3-030-17642-6_22
8 赵馨逸, 黄向东, 乔嘉林, 等. 基于不均匀空间划分和R树的时空索引[J]. 计算机研究与发展, 2019, 56(3): 666-676. DOI:10.7544/issn1000-1239.2019.20170750
ZHAO X Y, HUANG X D, QIAO J L, et al. A spatio-temporal index based on skew spatial coding and R-tree[J]. Journal of Computer Research and Development, 2019, 56(3): 666-676. DOI:10.7544/issn1000-1239.2019.20170750
doi: 10.7544/issn1000-1239.2019.20170750
9 VAN den BERCKEN J, SEEGER B, WIDMAYER P. A generic approach to bulk loading multidimensional index structures[C]// International Conference on Very Large Date Bases (VLDB) 1997. San Francisco: Morgan Kaufmann Publishers Inc, 1997, 97: 406-415. DOI:10.5555/645923.671016
doi: 10.5555/645923.671016
10 O'NEIL P, CHENG E, GAWLICK D, et al. The log-structured merge-tree[J]. Acta Informatica, 1996, 33(4): 351-385. DOI:10.1007/s002360050048
doi: 10.1007/s002360050048
11 MAO Q, QADER M A, HRISTIDIS V. Comprehensive comparison of LSM architectures for spatial data[C]// 2020 IEEE International Conference on Big Data. Atlanta: IEEE, 2020: 455-460. DOI:10.1109/BigData50022. 2020.9377919
doi: 10.1109/BigData50022. 2020.9377919
12 BÖXHM C, KLUMP G, KRIEGEL H P. XZ-Ordering: A Space-Filling Curve for Objects with Spatial Extension[M]. Berlin/Heidelberg: Springer, 1999: 75-90. doi:10.1007/3-540-48482-5_7
doi: 10.1007/3-540-48482-5_7
13 WANG L, CAI R C, FU T Z J, et al. Waterwheel: Realtime indexing and temporal range query processing over massive data streams[C]// 2018 IEEE 34th International Conference on Data Engineering (ICDE). Paris: IEEE, 2018: 269-280. DOI:10.1109/ICDE.2018.00033
doi: 10.1109/ICDE.2018.00033
14 蔡瑞初, 林峰极, 郝志峰, 等. 面向轨迹流数据的索引构建与存储方法研究[J]. 计算机工程, 2021, 47(3): 62-70. DOI:10.19678/j.issn.1000-3428.0057108
CAI R C, LIN F J, HAO Z F, et al. Research on index construction and storage method for trajectory stream data[J]. Computer Engineering, 2021, 47(3): 62-70. DOI:10.19678/j.issn.1000-3428.0057108
doi: 10.19678/j.issn.1000-3428.0057108
15 ALSUBAIEE S, BEHM A, BORKAR V, et al. Storage management in AsterixDB[J]. Proceedings of the VLDB Endowment, 2014, 7(10): 841-852. DOI:10.14778/2732951.2732958
doi: 10.14778/2732951.2732958
16 LARS G. HBase: The Definitive Guide[M]. Sebastopol: O' Reilly Media, Inc., 2010.
17 陆婷, 房俊, 乔彦克. 基于HBase的交通流数据实时存储系统[J]. 计算机应用, 2015, 35(1): 103-107, 135. DOI:10.11772/j.issn.1001-9081.2015.01.0103
LU T, FANG J, QIAO Y K. HBase-based real-time storage system for traffic stream data[J]. Journal of Computer Applications, 2015, 35(1): 103-107, 135. DOI:10.11772/j.issn.1001-9081. 2015.01.0103
doi: 10.11772/j.issn.1001-9081. 2015.01.0103
18 MA Y Z, RAO J, HU W S, et al. An efficient index for massive IOT data in cloud environment[C]// Proceedings of the 21th ACM International Conference on Information and Knowledge Management. New York: Association for Computing Machinery, 2012: 2129-2133. DOI:10.1145/2396761.2398587
doi: 10.1145/2396761.2398587
19 李欣. 基于Spark/HBase的交通流数据存储及索引模型探讨[J]. 地理与地理信息科学, 2019, 35(4): 1-8. DOI:10.3969/j.issn.1672-0504.2019.04.001
LI X. Discussion on the data storage and index model of traffic flow based on Spark/HBase[J]. Geography and Geo-Information Science, 2019, 35(4): 1-8. DOI:10.3969/j.issn.1672-0504.2019.04.001
doi: 10.3969/j.issn.1672-0504.2019.04.001
20 NISHIMURA S, DAS S, AGRAWAL D, et al. MD-HBase: A scalable multi-dimensional data infrastructure for location aware services[C]// 2011 IEEE 12th International Conference on Mobile Data Management. Lulea: IEEE, 2011, 1: 7-16. DOI:10.1109/MDM.2011.41
doi: 10.1109/MDM.2011.41
21 李攀宇, 贾宏. 基于HBase的交通数据时空分块索引[J]. 信息技术, 2019, 43(12): 116-120. DOI:10.13274/j.cnki.hdzj.2019.12.024
LI P Y, JIA H. Spatio-temporal block index for traffic data based on HBase[J]. Information Technology, 2019, 43(12): 116-120. DOI:10.13274/j.cnki.hdzj.2019.12.024
doi: 10.13274/j.cnki.hdzj.2019.12.024
22 CAI C R, LU Z J, WANG L, et al. DITIR: Distributed index for high throughput trajectory insertion and real-time temporal range query[J]. Proceedings of the VLDB Endowment, 2017, 10(12): 1865-1868. DOI:10.14778/3137765.3137795
doi: 10.14778/3137765.3137795
23 MOON B, JAGADISH H V, FALOUTSOS C, et al. Analysis of the clustering properties of the Hilbert space-filling curve[J]. IEEE Transactions on Knowledge and Data Engineering, 2001, 13(1): 124-141. DOI:10.1109/69.908985
doi: 10.1109/69.908985
24 LI K H. Reservoir-sampling algorithms of time complexity On(1+log(N/n)))[J]. ACM Transactions on Mathematical Software (TOMS), 1994, 20(4): 481-493. DOI:10.1145/198429.198435
doi: 10.1145/198429.198435
[1] 王昱文,杜震洪,戴震,刘仁义,张丰. 基于复合神经网络的多元水质指标预测模型[J]. 浙江大学学报(理学版), 2022, 49(3): 354-362.
[2] 汤求毅,王超,杜震洪,张丰,刘仁义. 基于时空老化模型的服务端瓦片缓存置换算法[J]. 浙江大学学报(理学版), 2022, 49(2): 210-218.