Research on the analysis and statistic of geographical conditions based on the strategy of “Grid Index + MapReduce”
LIN Yaping1,2, DU Zhenhong1,2, ZHANG Feng1,2, LIU Renyi1,2
1. Zhejiang Provincial Key Lab of GIS, Zhejiang University, Hangzhou 310028, China;
2. Department of Geographic Information Science, Zhejiang University, Hangzhou 310027, China
Abstract:The statistic of geographical conditions is the primary premise for the deep excavation and application of geographical data. However, the traditional centralized data storage and processing method based on a single computer are time-consuming, inefficient and even unsupported. This paper creates a strategy called "Grid Index + MapReduce" to solve these problems. Firstly, we design a blocking file organization and distributed storage mode of the census data of geographical situation based on the regular square grid, and then make a double layer filtering method which combines the grid index and the accurate analysis. Lastly, we build a parallel processing algorithm of statistic of the geography conditions based on MapReduce. The test results of performance comparison of the strategy of "Grid Index + MapReduce", the indexless MapReduce and ArcGIS software show that the method of "Grid Index + MapReduce" is much more efficient than the ArcGIS software, and also has obvious efficiency advantages for the indexless MapReduce method. The study tries to provide an optimal scheme for the high-performance, multi-type and high-volume statistic and analysis method for the data of geographical condition survey.
林雅萍, 杜震洪, 张丰, 刘仁义. “格网索引+MapReduce”策略下的地理国情统计分析研究[J]. 浙江大学学报(理学版), 2017, 44(6): 660-665.
LIN Yaping, DU Zhenhong, ZHANG Feng, LIU Renyi. Research on the analysis and statistic of geographical conditions based on the strategy of “Grid Index + MapReduce”. Journal of ZheJIang University(Science Edition), 2017, 44(6): 660-665.
[1] 吴桐,王小华,兀伟. 基于地理国情普查的格网统计分析研究[J].测绘标准化,2016,32(1):8-11. WU T, WANG X H, WU W. Grid statistical research based on national geographical conditions census[J]. Standardization of Surveying and Mapping, 2016, 32(1):8-11.
[2] 刘耀林,何力,何青松,等. 地理国情统计分析系统设计与应用[J]. 地理信息世界, 2015, 22(6):56-59. LIU Y L, HE L, HE Q S,et al. Design and achivement of a statistical analysis system for geographic national conditions surveying and monitoring[J].Geomatics World, 2015, 22(6):56-59.
[3] 林富明,李雁楠,刘恒飞. 基于天地图的地理国情统计分析信息发布服务系统设计[J].测绘与空间地理信息,2014, 37(6):23-25. LIN F M, LI Y N, LIU H F. Design of information publication and service system of national geographical condition statistical and analysis based on Tianditu[J]. Geomatics & Spatial Information Technology, 2014,37(6):23-25.
[4] 王军,杨东岳,张梁. 地理国情成果在线发布系统开发与应用研究[J].测绘与空间地理信息,2014, 37(10):114-116. WANG J, YANG D Y, ZHANG L. Geographic conditions the results published online system development and applied research[J]. Geomatics & Spatial Information Technology, 2014, 37(10):114-116.
[5] 肖提荣,吴玉婷,何照攀. 县域地理国情信息管理及统计分析监测系统的设计与实现——以华宁县为例[J]. 测绘通报, 2016(4):121-123. XIAO T R, WU Y T, HE Z P. Design and realization of monitoring system for management and statistical analysis of county geographic condition information:A case study of Huaning county[J]. Bulletin of Surveying and Mapping, 2016(4):121-123.
[6] CAO K. Cloud Computing and Its Applications in GIS[D]. Worcester:Clark University, 2011.
[7] ASTSATRYAN H, HAYRAPETYAN A, NARISISIAN W, et al. An interoperable web portal for parallel geoprocessing of satellite image vegetation indices[J]. Earth Science Informatics, 2015, 8(2):453-460.
[8] LYU Z, HU Y, ZHONG H, et al. Parallel K-means clustering of remote sensing images based on mapreduce[J]. Lecture Notes in Computer Science, 2010, 6318:162-170.
[9] ELDAWY A, MOKBEL M. A demonstration of Spatial Hadoop:An efficient mapreduce framework for spatial data[J]. Proceedings of the Vldb Endowment, 2013, 6(12):1230-1233.
[10] ELDAWY A, MOKBEL M F. Spatial Hadoop:A MapReduce Framework for spatial data[C]//201531st IEEE International Conference on Data Engineering (ICDE). Seoul:IEEE Computer Society, 2015:1352-1363.
[11] AJI A. High Performance Spatial Query Processing for Large Scale Spatial Data Warehousing[D]. Atlanta:Emory University, 2014.
[12] WANG J, LU C, WANG L Z. Concentric layout, a new scientific data layout for matrix data-set in Hadoop file system[J]. International Journal of Parallel Emergent & Distributed Systems, 2013, 28(5):407-433.
[13] DEAN J, GHEMAWAT S. MapReduce:Simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1):107-113.
[14] 余劲松弟,吴升. 面向大数据的地理格网分析操作模型比较[J].地球信息科学学报, 2013, 15(6):862-870. YU J S D, WU S. Research progress of array analytics towards big data[J]. Journal of Geo-Information Science, 2013, 15(6):862-870.
[15] PATEL J M, DEWITT D J. Partition based spatial-merge join[J]. ACM Sigmod Record, 2001, 25(2):259-270.
[16] DITTRICH J P, SEEGER B. Data redundancy and duplicate detection in spatial join processing[J]. IEEE Computer Society, 2000:535-546.