Please wait a minute...
Journal of Zhejiang University (Science Edition)  2023, Vol. 50 Issue (2): 204-212    DOI: 10.3785/j.issn.1008-9497.2023.02.010
Earth Science     
Distributed scheduling and storage scheme based on LSM-OCTree for spatiotemporal stream
Yueyi LI1,2,Feng ZHANG1,2(),Zhenhong DU1,2,Renyi LIU1,2
1.Zhejiang Provincial Key Lab of GIS,Zhejiang University,Hangzhou 310028,China
2.Department of Geographic Information Science,Zhejiang University,Hangzhou 310027,China
Download: HTML( 5 )   PDF(2545KB)
Export: BibTeX | EndNote (RIS)      

Abstract  

Efficient management of spatiotemporal stream requires to take spatiotemporal correlation into account and support high-speed insertion, real-time indexing and low delay spatiotemporal range query. However, due to the high cost of index update, existing scheduling and storage schemes based on HBase can hardly meet those requirements. According to the application characteristics of spatiotemporal stream, a time-slicing oriented distributed scheduling and storage method is proposed. The tight coupling of spatiotemporal stream is used for data division and scheduling to reduce the overhead of data replication during query. To achieve both index update performance and query efficiency of spatiotemporal stream, octree based on the log-structured merge-tree (LSM-OCTree) is constructed by pre-partition as the storage structure. Efficient batch merging method is employed to improve the overall query performance. Experimental results show that the spatiotemporal dynamic scheduling strategy is better than the general scheduling method, and the merging and updating performance of LSM-OCTree index is better than that of conventional index structure. Compared with HBase scheme, the query performance of distributed storage scheme based on LSM-OCTree is over 20% better. In summary, the distributed scheduling and storage scheme considers both index update and range query performance, and has high efficiency.



Key wordsspatiotemporal stream      spatiotemporal scheduling      LSM-OCTree      distributed storage     
Received: 14 February 2022      Published: 21 March 2023
CLC:  P 208  
Corresponding Authors: Feng ZHANG     E-mail: zfcarnation@zju.edu.cn
Cite this article:

Yueyi LI,Feng ZHANG,Zhenhong DU,Renyi LIU. Distributed scheduling and storage scheme based on LSM-OCTree for spatiotemporal stream. Journal of Zhejiang University (Science Edition), 2023, 50(2): 204-212.

URL:

https://www.zjujournals.com/sci/EN/Y2023/V50/I2/204


基于LSM-OCTree的时空流分布式调度和存储方案

时空流的高效管理要求顾及数据的时空相关性,支持时空流的高速插入、实时索引和低延迟时空范围查询,而现有的基于HBase等的存储方案,因索引更新开销过大,无法满足高效管理要求。针对时空流的应用特性,提出了一种面向时间分片的时空流分布式调度和存储方法。利用时空流的紧耦合性进行数据划分与调度,以减少查询时数据复制的开销。将采用预分区方式构建的基于日志结构合并树的八叉树(octree based on the log-structured merge-tree,LSM-OCTree)索引作为存储结构,保证时空流的索引更新,实现索引的高效批量合并计算,提高查询性能。实验结果表明,时空动态调度策略优于通用的调度方法,LSM-OCTree索引的合并与更新性能优于常规索引结构。与HBase方案相比,基于LSM-OCTree的时空流分布式调度和存储方案的查询效率提升了20%以上。


关键词: 时空流,  时空调度,  LSM-OCTree,  分布式存储 
Fig.1 Distributed spatiotemporal scheduling
Fig.2 Structure of LSM-OCTree
Fig.3 Merge method of LSM-OCTree
Fig.4 Construction of distributed LSM-OCTree for time slice
Fig.5 Spatial distribution of experimental data
Fig.6 Query performance comparison of three dispatch methods
Fig.7 Computing performance comparison of continuous query tasks using spatiotemporal scheduling and conventional scheduling
Fig.8 Merge performance comparison of three methods
Fig.9 Update performance comparison of LSM-OCTree and ordinary octree
Fig.10 Query performance comparison of three models under
[1]   KAZEMITABAR S J, DEMIRYUREK U, ALI M, et al. Geospatial stream query processing using Microsoft SQL Server Stream Insight[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 1537-1540. DOI:10.14778/1920841.1921032
doi: 10.14778/1920841.1921032
[2]   杨良怀, 沈东海, 范玉雷, 等. 面向时空流的移动对象空间索引构建[J]. 电子学报, 2021, 49(5): 992-1000. DOI:10.12263/DZXB.20200300
YANG L H, SHEN D H, FAN Y L, et al. A moving object spatial index for spatio-temporal data stream[J]. Acta Electronica Sinica, 2021, 49(5): 992-1000. DOI:10.12263/DZXB.20200300
doi: 10.12263/DZXB.20200300
[3]   CARBONE P, KATSIFODIMOS A, EWEN S, et al. Apache FlinkTM: Stream and batch processing in a single engine[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4): 28-38.
[4]   ZHANG F, ZHENG Y, XU D P, et al. Real-time spatial queries for moving objects using storm topology[J]. ISPRS International Journal of Geo-Information, 2016, 5(10): 178. DOI:10.3390/ijgi5100178
doi: 10.3390/ijgi5100178
[5]   SHAIKH S A, MARIAM K, KITAGAWA H, et al. GeoFlink: A distributed and scalable framework for the real-time processing of spatial streams[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: Association for Computing Machinery, 2020: 3149-3156. DOI:10.1145/3340531. 3412761
doi: 10.1145/3340531. 3412761
[6]   NAQVI S N Z, YFANTIDOU S, ZIMÁNYI E. Time Series Databases and InfluxDB[M]. Studienarbeit:Université Libre de Bruxelles, 2017.
[7]   ZHANG T, YANG L H, SHEN D H, et al. An efficient in-memory R-tree construction scheme for spatio-temporal data stream[C]// International Conference on Service-Oriented Computing. Cham: Springer, 2018: 253-265. DOI:10.1007/978-3-030-17642-6_22
doi: 10.1007/978-3-030-17642-6_22
[8]   赵馨逸, 黄向东, 乔嘉林, 等. 基于不均匀空间划分和R树的时空索引[J]. 计算机研究与发展, 2019, 56(3): 666-676. DOI:10.7544/issn1000-1239.2019.20170750
ZHAO X Y, HUANG X D, QIAO J L, et al. A spatio-temporal index based on skew spatial coding and R-tree[J]. Journal of Computer Research and Development, 2019, 56(3): 666-676. DOI:10.7544/issn1000-1239.2019.20170750
doi: 10.7544/issn1000-1239.2019.20170750
[9]   VAN den BERCKEN J, SEEGER B, WIDMAYER P. A generic approach to bulk loading multidimensional index structures[C]// International Conference on Very Large Date Bases (VLDB) 1997. San Francisco: Morgan Kaufmann Publishers Inc, 1997, 97: 406-415. DOI:10.5555/645923.671016
doi: 10.5555/645923.671016
[10]   O'NEIL P, CHENG E, GAWLICK D, et al. The log-structured merge-tree[J]. Acta Informatica, 1996, 33(4): 351-385. DOI:10.1007/s002360050048
doi: 10.1007/s002360050048
[11]   MAO Q, QADER M A, HRISTIDIS V. Comprehensive comparison of LSM architectures for spatial data[C]// 2020 IEEE International Conference on Big Data. Atlanta: IEEE, 2020: 455-460. DOI:10.1109/BigData50022. 2020.9377919
doi: 10.1109/BigData50022. 2020.9377919
[12]   BÖXHM C, KLUMP G, KRIEGEL H P. XZ-Ordering: A Space-Filling Curve for Objects with Spatial Extension[M]. Berlin/Heidelberg: Springer, 1999: 75-90. doi:10.1007/3-540-48482-5_7
doi: 10.1007/3-540-48482-5_7
[13]   WANG L, CAI R C, FU T Z J, et al. Waterwheel: Realtime indexing and temporal range query processing over massive data streams[C]// 2018 IEEE 34th International Conference on Data Engineering (ICDE). Paris: IEEE, 2018: 269-280. DOI:10.1109/ICDE.2018.00033
doi: 10.1109/ICDE.2018.00033
[14]   蔡瑞初, 林峰极, 郝志峰, 等. 面向轨迹流数据的索引构建与存储方法研究[J]. 计算机工程, 2021, 47(3): 62-70. DOI:10.19678/j.issn.1000-3428.0057108
CAI R C, LIN F J, HAO Z F, et al. Research on index construction and storage method for trajectory stream data[J]. Computer Engineering, 2021, 47(3): 62-70. DOI:10.19678/j.issn.1000-3428.0057108
doi: 10.19678/j.issn.1000-3428.0057108
[15]   ALSUBAIEE S, BEHM A, BORKAR V, et al. Storage management in AsterixDB[J]. Proceedings of the VLDB Endowment, 2014, 7(10): 841-852. DOI:10.14778/2732951.2732958
doi: 10.14778/2732951.2732958
[16]   LARS G. HBase: The Definitive Guide[M]. Sebastopol: O' Reilly Media, Inc., 2010.
[17]   陆婷, 房俊, 乔彦克. 基于HBase的交通流数据实时存储系统[J]. 计算机应用, 2015, 35(1): 103-107, 135. DOI:10.11772/j.issn.1001-9081.2015.01.0103
LU T, FANG J, QIAO Y K. HBase-based real-time storage system for traffic stream data[J]. Journal of Computer Applications, 2015, 35(1): 103-107, 135. DOI:10.11772/j.issn.1001-9081. 2015.01.0103
doi: 10.11772/j.issn.1001-9081. 2015.01.0103
[18]   MA Y Z, RAO J, HU W S, et al. An efficient index for massive IOT data in cloud environment[C]// Proceedings of the 21th ACM International Conference on Information and Knowledge Management. New York: Association for Computing Machinery, 2012: 2129-2133. DOI:10.1145/2396761.2398587
doi: 10.1145/2396761.2398587
[19]   李欣. 基于Spark/HBase的交通流数据存储及索引模型探讨[J]. 地理与地理信息科学, 2019, 35(4): 1-8. DOI:10.3969/j.issn.1672-0504.2019.04.001
LI X. Discussion on the data storage and index model of traffic flow based on Spark/HBase[J]. Geography and Geo-Information Science, 2019, 35(4): 1-8. DOI:10.3969/j.issn.1672-0504.2019.04.001
doi: 10.3969/j.issn.1672-0504.2019.04.001
[20]   NISHIMURA S, DAS S, AGRAWAL D, et al. MD-HBase: A scalable multi-dimensional data infrastructure for location aware services[C]// 2011 IEEE 12th International Conference on Mobile Data Management. Lulea: IEEE, 2011, 1: 7-16. DOI:10.1109/MDM.2011.41
doi: 10.1109/MDM.2011.41
[21]   李攀宇, 贾宏. 基于HBase的交通数据时空分块索引[J]. 信息技术, 2019, 43(12): 116-120. DOI:10.13274/j.cnki.hdzj.2019.12.024
LI P Y, JIA H. Spatio-temporal block index for traffic data based on HBase[J]. Information Technology, 2019, 43(12): 116-120. DOI:10.13274/j.cnki.hdzj.2019.12.024
doi: 10.13274/j.cnki.hdzj.2019.12.024
[22]   CAI C R, LU Z J, WANG L, et al. DITIR: Distributed index for high throughput trajectory insertion and real-time temporal range query[J]. Proceedings of the VLDB Endowment, 2017, 10(12): 1865-1868. DOI:10.14778/3137765.3137795
doi: 10.14778/3137765.3137795
[23]   MOON B, JAGADISH H V, FALOUTSOS C, et al. Analysis of the clustering properties of the Hilbert space-filling curve[J]. IEEE Transactions on Knowledge and Data Engineering, 2001, 13(1): 124-141. DOI:10.1109/69.908985
doi: 10.1109/69.908985
[24]   LI K H. Reservoir-sampling algorithms of time complexity On(1+log(N/n)))[J]. ACM Transactions on Mathematical Software (TOMS), 1994, 20(4): 481-493. DOI:10.1145/198429.198435
doi: 10.1145/198429.198435
[1] Yuwen WANG,Zhenhong DU,Zhen DAI,Renyi LIU,Feng ZHANG. Multivariate water quality parameter prediction model based on hybrid neural network[J]. Journal of Zhejiang University (Science Edition), 2022, 49(3): 354-362.
[2] Qiuyi TANG,Chao WANG,Zhenhong DU,Feng ZHANG,Renyi LIU. Server-side cache replacement algorithm based on spatiotemporal aging model for tiles[J]. Journal of Zhejiang University (Science Edition), 2022, 49(2): 210-218.