Please wait a minute...
浙江大学学报(理学版)  2017, Vol. 44 Issue (6): 660-665    DOI: 10.3785/j.issn.1008-9497.2017.06.004
地球科学     
“格网索引+MapReduce”策略下的地理国情统计分析研究
林雅萍1,2, 杜震洪1,2, 张丰1,2, 刘仁义1,2
1. 浙江大学 浙江省资源与环境信息系统重点实验室, 浙江 杭州 310028;
2. 浙江大学 地理信息科学研究所, 浙江 杭州 310027
Research on the analysis and statistic of geographical conditions based on the strategy of “Grid Index + MapReduce”
LIN Yaping1,2, DU Zhenhong1,2, ZHANG Feng1,2, LIU Renyi1,2
1. Zhejiang Provincial Key Lab of GIS, Zhejiang University, Hangzhou 310028, China;
2. Department of Geographic Information Science, Zhejiang University, Hangzhou 310027, China
 全文: PDF(1273 KB)   HTML  
摘要: 地理国情统计分析是深度研究地理国情普查数据的首要前提.针对现有单机集中式数据存储与处理方式存在耗时长、效率低甚至不支持的问题,设计了“格网索引+MapReduce”策略,基于规则格网设计普查数据文件的分块组织与分布式存储方式,研制了格网索引与空间分析相结合的双层过滤机制,构建基于MapReduce的地理国情并行统计算法.最后,与无索引MapReduce、ArcGIS平台进行性能对比测试,结果表明:“格网索引+MapReduce”方法的统计效率远高于ArcGIS平台,对无索引MapReduce方法亦有明显的效率优势,研究拟为地理国情普查数据的高性能、多类型、大批量统计分析提供优选方案.
关键词: 地理国情统计分析地理国情普查数据格网索引MapReduce    
Abstract: The statistic of geographical conditions is the primary premise for the deep excavation and application of geographical data. However, the traditional centralized data storage and processing method based on a single computer are time-consuming, inefficient and even unsupported. This paper creates a strategy called "Grid Index + MapReduce" to solve these problems. Firstly, we design a blocking file organization and distributed storage mode of the census data of geographical situation based on the regular square grid, and then make a double layer filtering method which combines the grid index and the accurate analysis. Lastly, we build a parallel processing algorithm of statistic of the geography conditions based on MapReduce. The test results of performance comparison of the strategy of "Grid Index + MapReduce", the indexless MapReduce and ArcGIS software show that the method of "Grid Index + MapReduce" is much more efficient than the ArcGIS software, and also has obvious efficiency advantages for the indexless MapReduce method. The study tries to provide an optimal scheme for the high-performance, multi-type and high-volume statistic and analysis method for the data of geographical condition survey.
Key words: the statistic and analysis of geographical conditions    the data of geographical condition survey    grid index    MapReduce
收稿日期: 2016-12-08 出版日期: 2018-04-09
CLC:  P208  
基金资助: 国家自然科学基金资助项目(41471313,41671391);国家科技基础性工作专项(2012FY112300);国家海洋公益性行业科研专项(201505003);浙江省科技攻关计划项目(2015C33021).
通讯作者: 张丰,ORCID:http://orcid.org/0000-0003-1475-8480,E-mail:zfcarnation@zju.edu.cn.     E-mail: zfcarnation@zju.edu.cn
作者简介: 林雅萍(1992-),ORCID:http://orcid.org/0000-0002-9324-7293,女,硕士,主要从事地理国情与云计算相关研究.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
林雅萍
杜震洪
张丰
刘仁义

引用本文:

林雅萍, 杜震洪, 张丰, 刘仁义. “格网索引+MapReduce”策略下的地理国情统计分析研究[J]. 浙江大学学报(理学版), 2017, 44(6): 660-665.

LIN Yaping, DU Zhenhong, ZHANG Feng, LIU Renyi. Research on the analysis and statistic of geographical conditions based on the strategy of “Grid Index + MapReduce”. Journal of ZheJIang University(Science Edition), 2017, 44(6): 660-665.

链接本文:

https://www.zjujournals.com/sci/CN/10.3785/j.issn.1008-9497.2017.06.004        https://www.zjujournals.com/sci/CN/Y2017/V44/I6/660

[1] 吴桐,王小华,兀伟. 基于地理国情普查的格网统计分析研究[J].测绘标准化,2016,32(1):8-11. WU T, WANG X H, WU W. Grid statistical research based on national geographical conditions census[J]. Standardization of Surveying and Mapping, 2016, 32(1):8-11.
[2] 刘耀林,何力,何青松,等. 地理国情统计分析系统设计与应用[J]. 地理信息世界, 2015, 22(6):56-59. LIU Y L, HE L, HE Q S,et al. Design and achivement of a statistical analysis system for geographic national conditions surveying and monitoring[J].Geomatics World, 2015, 22(6):56-59.
[3] 林富明,李雁楠,刘恒飞. 基于天地图的地理国情统计分析信息发布服务系统设计[J].测绘与空间地理信息,2014, 37(6):23-25. LIN F M, LI Y N, LIU H F. Design of information publication and service system of national geographical condition statistical and analysis based on Tianditu[J]. Geomatics & Spatial Information Technology, 2014,37(6):23-25.
[4] 王军,杨东岳,张梁. 地理国情成果在线发布系统开发与应用研究[J].测绘与空间地理信息,2014, 37(10):114-116. WANG J, YANG D Y, ZHANG L. Geographic conditions the results published online system development and applied research[J]. Geomatics & Spatial Information Technology, 2014, 37(10):114-116.
[5] 肖提荣,吴玉婷,何照攀. 县域地理国情信息管理及统计分析监测系统的设计与实现——以华宁县为例[J]. 测绘通报, 2016(4):121-123. XIAO T R, WU Y T, HE Z P. Design and realization of monitoring system for management and statistical analysis of county geographic condition information:A case study of Huaning county[J]. Bulletin of Surveying and Mapping, 2016(4):121-123.
[6] CAO K. Cloud Computing and Its Applications in GIS[D]. Worcester:Clark University, 2011.
[7] ASTSATRYAN H, HAYRAPETYAN A, NARISISIAN W, et al. An interoperable web portal for parallel geoprocessing of satellite image vegetation indices[J]. Earth Science Informatics, 2015, 8(2):453-460.
[8] LYU Z, HU Y, ZHONG H, et al. Parallel K-means clustering of remote sensing images based on mapreduce[J]. Lecture Notes in Computer Science, 2010, 6318:162-170.
[9] ELDAWY A, MOKBEL M. A demonstration of Spatial Hadoop:An efficient mapreduce framework for spatial data[J]. Proceedings of the Vldb Endowment, 2013, 6(12):1230-1233.
[10] ELDAWY A, MOKBEL M F. Spatial Hadoop:A MapReduce Framework for spatial data[C]//201531st IEEE International Conference on Data Engineering (ICDE). Seoul:IEEE Computer Society, 2015:1352-1363.
[11] AJI A. High Performance Spatial Query Processing for Large Scale Spatial Data Warehousing[D]. Atlanta:Emory University, 2014.
[12] WANG J, LU C, WANG L Z. Concentric layout, a new scientific data layout for matrix data-set in Hadoop file system[J]. International Journal of Parallel Emergent & Distributed Systems, 2013, 28(5):407-433.
[13] DEAN J, GHEMAWAT S. MapReduce:Simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1):107-113.
[14] 余劲松弟,吴升. 面向大数据的地理格网分析操作模型比较[J].地球信息科学学报, 2013, 15(6):862-870. YU J S D, WU S. Research progress of array analytics towards big data[J]. Journal of Geo-Information Science, 2013, 15(6):862-870.
[15] PATEL J M, DEWITT D J. Partition based spatial-merge join[J]. ACM Sigmod Record, 2001, 25(2):259-270.
[16] DITTRICH J P, SEEGER B. Data redundancy and duplicate detection in spatial join processing[J]. IEEE Computer Society, 2000:535-546.
[1] 祝琳莹, 张丰, 杜震洪, 刘仁义, 左玉强. 基于HBase与静态多级格网索引的地表覆盖数据高效检索方法[J]. 浙江大学学报(理学版), 2018, 45(5): 595-604.