Please wait a minute...
浙江大学学报(工学版)
地球科学与工程     
面向并行空间连接的两轮映射数据划分方法
付仲良,赵星源,王楠,杨元维
1. 武汉大学 遥感信息工程学院,湖北 武汉 430079
2. 地球空间信息技术协同创新中心,湖北 武汉 430079
3. 中国科学院 遥感与数字地球研究所,北京 100094
Two-rounds-map spatial data partitioning method towards parallel spatial join
FU Zhong liang, ZHAO Xing yuan, WANG Nan, YANG Yuan wei
1.School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China;
2. Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China;
3. Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100094, China
 全文: PDF(2822 KB)   HTML
摘要:

针对数据划分结果高冗余、低均衡可能会增加系统的工作负荷和影响系统的负载均衡这一问题,提出两轮映射数据划分方法.在第一轮映射中,通过充分利用划分对象的空间属性来减少冗余数据,通过合理设置阈值来均衡划分数据;在第二轮映射中,通过动态映射机制,提高划分结果的数据量均衡度.与Oracle Spatial数据划分方法、线性编码轮询调度划分方法以及Hilbert编码轮询调度划分方法进行比较可知,采用两轮映射方法可以有效地控制冗余数据的产生,大幅提高划分结果的数据量均衡度,具备较好的划分效率.

Abstract:

Too much repeated spatial objects (the redundant data) which are contained by the partitioned data blocks and low-level data balance degree among the partitioned data blocks would increase the system’s work load and reduce the system’s load balance. The two-rounds-map partitioning method was proposed to solve the problem. Redundant data was reduced by making full use of the space attribute of the partitioned objects within the first round map, and spatial data was evenly partitioned by reasonable setting threshold. Data balance degree was improved by dynamic mapping mechanism within the second round map. Experimental results show that the two-rounds-map partitioning method can effectively reduce redundant data and dramatically improve the data balance degree of partitioned results compared with Oracle Spatial, Linear-Round-Robin and Hilbert-Round-Robin. The two-rounds-map partitioning method has a good computational efficiency.

出版日期: 2017-01-01
CLC:  P 208  
基金资助:

国家自然科学基金资助项目(41501391).

通讯作者: 赵星源,男,博士生.ORCID: 0000-0003-2812-3064.     E-mail: xyz880330@whu.edu.cn
作者简介: 付仲良(1965—),男,教授,博导,从事地理信息科学的研究.ORCID: 0000-0002-8907-2152. E-mail: fuzhl@263.net
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  

引用本文:

付仲良,赵星源,王楠,杨元维. 面向并行空间连接的两轮映射数据划分方法[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2017.01.027.

FU Zhong liang, ZHAO Xing yuan, WANG Nan, YANG Yuan wei. Two-rounds-map spatial data partitioning method towards parallel spatial join. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2017.01.027.

[1] JACOX E H, SAMET H. Spatial join techniques [J]. ACM Transactions on Database Systems, 2007, 32(1): 14-23.
[2] SIDLAUSKAS D, JENSEN C S. Spatial joins in main memory: implementation matters! [J]. Proceedings of the VLDB Endowment, 2014, 8(1): 97-100.
[3] 付仲良,刘思远,俞志强.一种双映射变换的空间索引及空间连接算法研究[J].武汉大学学报:信息科学版,2014, 39(10): 1248-1251.
FU Zhongliang, LIU Siyuan, YU Zhiqiang. A novel spatial index with a highperformance spatial join [J]. Geomatics and Information Science of Wuhan University, 2014, 39(10): 1248-1251.
[4] ZHOU X F, ABEL D J, TRUFFET D. Data partitioning for parallel spatial join processing [J]. Geoinformatica, 1998, 2(2): 175-204.
[5] HOEL E G, SAMET H. Dataparallel spatial join algorithms [C]∥International Conference on Parallel Processing. Illinois: IEEE, 1994: 227-234.
[6] ELDAWY A, MOKBEL M F. SpatialHadoop: a MapReduce framework for spatial data [C]∥ 2015 IEEE 31st International Conference on Data Engineering (ICDE). Seoul: IEEE, 2015: 1352-1363.
[7] ZHANG J T, YOU S M, GRUENWALD L. Lightweight distributed execution engine for largescale spatial join query processing [C]∥ 2015 IEEE International Congress on Big Data (BigData Congress). Santa Clara: IEEE, 2015: 150-157.
[8] 陈勇旭,陈梦杰,刘雪冰,等.基于MapReduce的连接聚集查询算法研究[J].计算机研究与发展,2013,50(增1): 306-311.
CHEN Yongxu, CHEN Mengjie, LIU Xuebing, et al. MapReduce based aggregatejoin query algorithms [J]. Journal of Computer Research and Development, 2013, 50(supple.1): 306311.
[9] GUTTMAN A. RTrees: a dynamic index structure for spatial searching [J]. SIGMOD84, 1984, 14(2):47-57.
[10] BRINKHOFF T, KRIEGEL H P, SEEGER B. Parallel processing of spatial joins using Rtrees [C]∥Proceedings of the 12th International Conference on Data Engineering. New Orleans: IEEE, 1996: 258-265.
[11] MUTENDA I, KITSUREGAWA M. Parallel Rtree spatial join for a sharednothing architecture [C]∥1999 International Symposium on Database Applications in NonTraditional Environments. Kyoto: IEEE, 1999: 423-430.
[12] 刘义,陈荦,景宁,等.基于R树索引的MapReduce空间连接聚集操作[J].国防科技大学学报,2013,35(1): 136-141.
LIU Yi, CHEN Luo, JING Ning, et al. Processing spatial join aggregate in MapReduce based on Rtree [J]. Journal of National University of Defense Technology, 2013, 35(1): 136-141.
[13] BECKMANN N, KRIEGEL H P, SCHNEIDER R, et al. The r*tree: an efficient and robust access method for points and rectangles [J]. ACM SIGMOD Record, 1990, 19(2): 322331.
[14] SELLIS T K, ROUSSOPOULOS N, FALOUTSOS C. The R+Tree: a dynamic index for multidimensional objects [C]∥International Conference on Very Large Data Bases. Brighton: Morgan Kaufmann, 1987: 507-518.
[15] ELDAWY A, MOKBEL M F. A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data [J]. Proceedings of the Vldb Endowment, 2013, 6(12): 1230-1233.
[16] 张丽芬,王晓华,胡景松,等.基于网格划分的几种空间索引[J].北京理工大学学报,2004, 24(2): 140-144.
ZHANG Lifen, WANG Xiaohua, HU Jingsong, et al. Spatial indices based on grid partition [J]. Transactions of Beijing Institute of Technology, 2004, 24(2): 140-144.
[17] 赵春宇,孟令奎,林志勇.一种面向并行空间数据库的数据划分算法研究[J].武汉大学学报:信息科学版,2006, 31(11): 962-965.
ZHAO Chunyu, MENG Lingkui, LIN Zhiyong. Spatial data partitioning towards parallel spatial database system [J]. Geomatics and Information Science of Wuhan University, 2006, 31(11): 962965.
[18] 王永杰,孟令奎,赵春宇.基于Hilbert空间排列码的海量空间数据划分算法研究[J].武汉大学学报:信息科学版,2007, 32(7): 650-653.
WANG Yongjie, MENG Lingkui, ZHAO Chun yu. Spatial partitioning of massive data based on Hilbert spatial ordering code [J]. Geomatics and Information Science of Wuhan University, 2007, 32(7): 650-653.
[19] PATEL J M, DEWITT D J. Partition based spatialmerge join [C]∥ACM SIGMOD Record. Montreal: ACM, 1996, 25(2): 259-270.
[20] ZHANG S B, HAN J Z, LIU Z Y, et al. Sjmr: parallelizing spatial join with mapreduce on clusters [C]∥IEEE International Conference on Cluster Computing and Workshops. New Orleans: IEEE, 2009: 18.
[21] DEWITT D J, NAUGHTON J F, SCHNEIDER D A, et al. Practical skew handling in parallel joins[C]∥Proceedings of the 18th VLDB Conference. Vancouver: VLDB, 1992: 2740.
[22] LAM W M, SHAPIRO J M. A class of fast algorithms for the PeanoHilbert spacefilling curve [C]∥IEEE International Conference on Image Processing. Austin: IEEE, 1994: 638-641.

[1] 孙乐乐,金宝轩. 两步解码式空间矢量数据并行转换算法[J]. 浙江大学学报(工学版), 2020, 54(9): 1768-1776.
[2] 陈华锋, 叶时平, 黄智才, 等. 融合不规则三角网和遗传算法的大洋科考航线设计方法[J]. J4, 2009, 43(11): 1951-1957.