Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2020, Vol. 54 Issue (9): 1768-1776    DOI: 10.3785/j.issn.1008-973X.2020.09.013
    
Spatial vector data parallel conversion algorithm based on two-step decoding
Le-le SUN1(),Bao-xuan JIN2,*()
1. College of Tourism and Geography Science, Yunnan Normal University, Kunming 650500, China
2. Yunnan Provincial Department of Natural Resources, Kunming 650224, China
Download: HTML     PDF(1065KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

In view of the poor scalability and data skew in traditional single-machine conversion tools and RangePartitioner-based parallel methods, A spatial vector data (SVD) parallel conversion was proposed based on two-step decoding. An optimized geometry-parsing algorithm was introduced as a basic decoding tool with the storage schema of SVD in geospatial database (GDB). Only the spatial metadata was parsed in the first-step decoding, and the task was balanced according to the set geometry complexity to improve the balance between parsing and data. In the later-step decoding, the compressed geometry bytes were extracted and parsed with the geometric parallel parsing mechanism, to improve the conversion efficiency. This algorithm was implemented on Apache Spark, which was compared with ArcGIS conversion tool and the RangePartitioner-based parallel query transform algorithm. The experimental results verify that the proposed algorithm has significant advantages in efficiency and performance expansion; the conversion efficiency is promoted by 2.5?117 times; and the data skew caused by uneven geometric complexity is greatly reduced.



Key wordsgeographic information system      spatial vector data (SVD)      data parallel conversion      data skew     
Received: 02 August 2019      Published: 22 September 2020
CLC:  P 208  
Corresponding Authors: Bao-xuan JIN     E-mail: sycamoresun@foxmail.com;jinbx163@163.com
Cite this article:

Le-le SUN,Bao-xuan JIN. Spatial vector data parallel conversion algorithm based on two-step decoding. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1768-1776.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.09.013     OR     http://www.zjujournals.com/eng/Y2020/V54/I9/1768


两步解码式空间矢量数据并行转换算法

传统单机转换工具与基于范围分区方案的并行转换算法存在扩展性差、数据倾斜的问题,为此提出两步解码式空间矢量数据(SVD)并行转换算法. 通过归纳地理空间数据库(GDB)中空间矢量数据的存储编码模式,构建优化后的几何解码函数作为基础工具. 初次解码:仅解析空间元数据,根据几何复杂度平衡解析任务,提高解析与数据量的均衡度;二次解码:借助几何并行解析机制提取、解析压缩几何字节,提高转换效率. 该算法基于Spark实现,将其与ArcGIS单机转换工具、基于范围分区方案的并行查询转换算法进行对比可知,所提算法具有显著的效率、性能扩展优势,转换效率提升了2.5~117倍,大幅降低了几何复杂度不均导致的数据倾斜情况.


关键词: 地理信息系统,  空间矢量数据(SVD),  数据并行转换,  数据倾斜 
字节 内容
0 字节数标识,高位为符号位,值为固定值4
1~4 表头元信息,可略过
5~6 标识STRUCT总字节数,二进制
7~m 对应ST_GEOMETRY编码块,m为字节数组末尾位
Tab.1 Encoding mode of SHAPE in ArcSDE for Oracle
Fig.1 Implementation process of spatial vector data parallel conversion algorithm based on two-step decoding
数据集 数据区域 要素数/M 构成点数/M 数据大小/GB
${D_0}$ 怒江州 0.103 11.939 0.512
${D_1}$ 33° 带 0.041 44.305 1.300
${D_2}$ 35° 带 1.996 192.056 6.300
${D_3}$ 云南省 8.280 730.922 22.900
${D_4}$ 云南省 8.396 2069.057 56.730
Tab.2 Information of experimental datasets
Fig.2 Comparison of query response time between ST_AsText and ST_AsBinary functions with different query data sizes
转换方法 ${D_0}$ ${D_1}$ ${D_2}$ ${D_3}$ ${D_4}$
ArcGIS 5.60 16.42 87.80 252.40 689.22
PSGD 1.90 2.68 8.93 49.98 53.70
一步解码法 0.53 2.08 2.67 4.67 6.73
两步解码法 0.51 1.08 1.73 4.03 5.87
Tab.3 Comparison of execution time by various conversion methods for different datasets min
Fig.3 Comparison of cluster CPU utilization by various parallel conversion methods for different datasets
Fig.4 Comparison of write speed in cluster disk by various parallel conversion methods for different datasets
Fig.5 Comparison of skewness in execution time by various parallel conversion methods for different datasets
Fig.6 Comparison of block skewness by various parallel conversion methods for different datasets
[1]   李军, 费川云 地球空间数据集成研究概况[J]. 地理科学进展, 2000, 19 (3): 203- 211
LI Jun, FEI Chuan-yun Overview of study on geo-spatial data integration[J]. Progress in Geography, 2000, 19 (3): 203- 211
doi: 10.3969/j.issn.1007-6301.2000.03.002
[2]   李清泉, 李德仁 大数据GIS[J]. 武汉大学学报: 信息科学版, 2014, 39 (6): 641- 644
LI Qing-quan, LI De-ren Big data GIS[J]. Geomatics and Information Science of Wuhan University, 2014, 39 (6): 641- 644
[3]   人民网. 土地调查国家级数据库实现全国“一张图”[EB/OL]. (2015-01-02)[2019-12-24]. http://scitech.people.com.cn/n/2015/0102/c1057-26311822.html.
[4]   人民日报. 首次全国地理国情普查完成[EB/OL]. (2017-01-03)[2019-12-24]. http://www.gov.cn/xinwen/2017-01/03/content_5155812.htm.
[5]   乐鹏, 吴昭炎, 上官博屹 基于Spark的分布式空间数据存储结构设计与实现[J]. 武汉大学学报: 信息科学版, 2018, 43 (12): 542- 549
YUE Peng, WU Zhao-yan, SHANGGUAN Bo-yi Design and implement of a distributed geospatial data storage structure based on spark[J]. Geomatics and Information Science of Wuhan University, 2018, 43 (12): 542- 549
[6]   YUE P, TAN Z GIS databases and NoSQL databases[J]. Comprehensive Geographic Information Systems, 2018, 6 (1): 50- 79
[7]   LI W, SONG M, ZHOU B, et al Performance improvement techniques for geospatial web services in a cyberinfrastructure environment: a case study with a disaster management portal[J]. Computers Environment and Urban Systems, 2015, 54 (3): 314- 325
[8]   陈德权 基于GeoJSON的WFS实现方式[J]. 测绘科学技术学报, 2011, 28 (1): 66- 69
CHEN De-quan The realization of WFS based on GeoJSON[J]. Journal of Geomatics Science and Technology, 2011, 28 (1): 66- 69
doi: 10.3969/j.issn.1673-6338.2011.01.016
[9]   龚健雅, 贾文珏, 陈玉敏, 等 从平台GIS到跨平台互操作GIS的发展[J]. 武汉大学学报: 信息科学版, 2004, 29 (11): 985- 989
GONG Jian-ya, JIA Wen-jue, CHEN Yu-min, et al Development from platform GIS to cross-platform interoperable GIS[J]. Geomatics and Information Science of Wuhan University, 2004, 29 (11): 985- 989
[10]   占美志, 何政伟, 李程 基于GML的空间数据集成技术研究[J]. 地理信息世界, 2014, (2): 29- 32
ZHAN Zhi-mei, HE Zheng-wei, LI Cheng Research of integration technology of spatial data based on GML[J]. Geomatics World, 2014, (2): 29- 32
doi: 10.3969/j.issn.1672-1586.2014.02.008
[11]   ASTRIANI W, TRISMININGSIH R Extraction, transformation, and loading (ETL) module for hotspot spatial data warehouse using Geokettle[J]. Procedia Environmental Sciences, 2016, 33: 626- 634
doi: 10.1016/j.proenv.2016.03.117
[12]   裴莲莲, 唐建智, 毕小硕 多源空间大数据的获取及在城市规划中的应用[J]. 地理信息世界, 2019, 26 (1): 13- 17
PEI Lian-lian, TANG Jian-zhi, BI Xiao-shuo The acquisition of multi-source spatial data and its application to urban planning[J]. Geomatics World, 2019, 26 (1): 13- 17
doi: 10.3969/j.issn.1672-1586.2019.01.003
[13]   ANEJIONU O C D, THAKURIAH P, MCHUGH A, et al Spatial urban data system: a cloud-enabled big data infrastructure for social and economic urban analytics[J]. Future Generation Computer Systems, 2019, 98 (9): 456- 473
[14]   姚晓闯. 矢量大数据管理关键技术研究 [D]. 北京: 中国农业大学, 2017: 48.
YAO Xiao-chuang. Research on key technologies of vector big data management [D]. Beijing: China Agricultural University, 2017: 48.
[15]   张少将. 基于Hadoop的地理空间大数据存储与查询技术[D]. 西安: 西安电子科技大学, 2017: 34.
ZHANG Shao-jiang. Hadoop-based geospatial data storage and query technology [D]. Xi’an: Xidian University, 2017: 34.
[16]   周经纬. 矢量大数据高性能计算模型及关键技术研究[D]. 杭州: 浙江大学, 2016: 89.
ZHOU Jing-wei. Research on big vector data’s high performance computing model and key technologies [D]. Hangzhou: Zhejiang University, 2016: 89.
[17]   李家, 曹威 Oracle Spatial空间数据在ArcSDE中的图层注册[J]. 计算机系统应用, 2015, 24 (1): 143- 146
LI Jia, CAO Wei Layer register of Oracle Spatial data in ArcSDE[J]. Computer Systems and Applications, 2015, 24 (1): 143- 146
doi: 10.3969/j.issn.1003-3254.2015.01.026
[18]   吴锦超 基于Oracle的ArcSDE数据迁移[J]. 测绘与空间地理信息, 2018, 41 (3): 154- 155
WU Jin-chao Data migration of ArcSDE based on Oracle[J]. Geomatics and Spatial Information Technology, 2018, 41 (3): 154- 155
doi: 10.3969/j.issn.1672-5867.2018.03.048
[19]   YAO X, MOKBEL M F, ALARABI L, et al Spatial coding-based approach for partitioning big spatial data in Hadoop[J]. Computers and Geosciences, 2017, 106: 60- 67
doi: 10.1016/j.cageo.2017.05.014
[20]   ELDAWY A, ALARABI L, MOKBEL M F Spatial partitioning techniques in Spatial Hadoop[J]. Proceedings of the VLDB Endowment, 2015, 8 (12): 1602- 1605
doi: 10.14778/2824032.2824057
[21]   ZEILER M. Modeling our world: the ESRI guide to geodatabase design [M]. Redlands: ESRI Press, 1999: 8.
[22]   ESRI. ArcGIS所支持的Oracle数据类型[EB/OL]. (2014-05-10)[2019-08-01]. http://resources.arcgis.com/zh-cn/help/main/10.2/index.html#/na/002n00000067000000/.
[23]   王怀, 樊文锋, 叶芳宏 基于ArcSDE的省级基础地理信息数据库系统建设[J]. 地理信息世界, 2011, 9 (3): 65- 69
WANG Huai, FAN Wen-feng, YE Fang-hong Building provincial fundamental geographic information database system based on ArcSDE[J]. Geomatics World, 2011, 9 (3): 65- 69
doi: 10.3969/j.issn.1672-1586.2011.03.013
[24]   周龙廷. 直接访问ArcSDE空间数据模型的技术方法研究[D]. 上海: 华东师范大学, 2011: 30.
ZHOU Long-ting. The technical research of methods to direct access to ArcSDE spatial data model [D]. Shanghai: East China Normal University, 2011: 30.
[1] FU Zhong liang, ZHAO Xing yuan, WANG Nan, YANG Yuan wei. Two-rounds-map spatial data partitioning method towards parallel spatial join[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(1): 212-224.
[2] LU Ying, GUO Liang-jie, HOU Yun-yue, ZHAO Yun-sheng, CHEN Lian-jin. Comprehensive multi-hazard risk assessment method applicated in urban land-use planning[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(3): 538-546.