Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2016, Vol. 17 Issue (1): 15-31    DOI: 10.1631/FITEE.1500015
    
Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal
Department of Computer Science and Engineering, NIT Silchar, India
Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal
Dipayan DEV(),Ripon PATGIRI()
Department of Computer Science and Engineering, NIT Silchar, India
 全文: PDF(1278 KB)   HTML
摘要:

In this Exa byte scale era, data increases at an exponential rate. This is in turn generating a massive amount of metadata in the file system. Hadoop is the most widely used framework to deal with big data. Due to this growth of huge amount of metadata, however, the efficiency of Hadoop is questioned numerous times by many researchers. Therefore, it is essential to create an efficient and scalable metadata management for Hadoop. Hash-based mapping and subtree partitioning are suitable in distributed metadata management schemes. Subtree partitioning does not uniformly distribute workload among the metadata servers, and metadata needs to be migrated to keep the load roughly balanced. Hash-based mapping suffers from a constraint on the locality of metadata, though it uniformly distributes the load among NameNodes, which are the metadata servers of Hadoop. In this paper, we present a circular metadata management mechanism named dynamic circular metadata splitting (DCMS). DCMS preserves metadata locality using consistent hashing and locality-preserving hashing, keeps replicated metadata for excellent reliability, and dynamically distributes metadata among the NameNodes to keep load balancing. NameNode is a centralized heart of the Hadoop. Keeping the directory tree of all files, failure of which causes the single point of failure (SPOF). DCMS removes Hadoop’s SPOF and provides an efficient and scalable metadata management. The new framework is named ‘Dr. Hadoop’ after the name of the authors.

Abstract:

In this Exa byte scale era, data increases at an exponential rate. This is in turn generating a massive amount of metadata in the file system. Hadoop is the most widely used framework to deal with big data. Due to this growth of huge amount of metadata, however, the efficiency of Hadoop is questioned numerous times by many researchers. Therefore, it is essential to create an efficient and scalable metadata management for Hadoop. Hash-based mapping and subtree partitioning are suitable in distributed metadata management schemes. Subtree partitioning does not uniformly distribute workload among the metadata servers, and metadata needs to be migrated to keep the load roughly balanced. Hash-based mapping suffers from a constraint on the locality of metadata, though it uniformly distributes the load among NameNodes, which are the metadata servers of Hadoop. In this paper, we present a circular metadata management mechanism named dynamic circular metadata splitting (DCMS). DCMS preserves metadata locality using consistent hashing and locality-preserving hashing, keeps replicated metadata for excellent reliability, and dynamically distributes metadata among the NameNodes to keep load balancing. NameNode is a centralized heart of the Hadoop. Keeping the directory tree of all files, failure of which causes the single point of failure (SPOF). DCMS removes Hadoop’s SPOF and provides an efficient and scalable metadata management. The new framework is named ‘Dr. Hadoop’ after the name of the authors.

Key words: Hadoop    NameNode    Metadata    Locality-preserving hashing    Consistent hashing
收稿日期: 2015-01-12 出版日期: 2016-01-05
CLC:  TP311  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
  
  
  
  
  
  
  
  
  
Parameter Traditional Hadoop Dr. Hadoop
Maximum number of NameNode crushes that can survive0r — 1
Number of RPCs needed for read operation11
Number of RPCs needed for write operation1r
Metadata storage per NameNodeX(X/r)m
Throughput of metadata readXXm
Throughput of metadata writeXX (m/r)
  
Trace Number of
files
Data size
(GB)
Metadata extracted
(MB)
Yahoo 8 139 723 256.8 596.4
Microsoft 7 725 928 223.7 416.2
  
  
  
Data
(GB)
namespace
(MB)
1029.80
2053.13
3077.92
4096.18
50119.02
60142.11
70159.17
80181.67
90203.09
100229.37
110251.04
120279.30
130299.82
140318.33
150337.22
160356.71
170373.18
180394.22
190426.13
200454.76
210481.01
220512.16
230536.92
240558.23
250570.12
256.8594.40
  
Data
(GB)
namespace
(MB)
1020.50
2041.83
3059.03
4077.08
50103.07
60128.18
70141.90
80157.19
90174.34
100190.18
110214.20
120237.43
130254.12
140271.89
150297.12
160312.71
170329.11
180352.12
190369.77
200384.76
210393.04
220401.86
223.7416.02
  
Data
(GB)
Namespace
(MB)
YahooMicrosoft
100684.11572.65
2001364.901152.89
256.81789.341248.02
  
  
  
  
  
1 Aguilera MK , Chen W , Toueg S . et al. . Heartbeat: a timeoutfree failure detector for quiescent reliable communication. 1997 Proc. 11th Int. Workshop on Distributed Algorithms: 126 - 140 doi: 10.1007/BFb0030680
doi: 10.1007/BFb0030680
2 Apache Software Foundation, Hot Standby for NameNode, 2012, Available from: http://issues.apache.org/jira/browse/HDFS-976
3 Beaver D , Kumar S , Li HC . et al. . Finding aneedle in haystack: Facebook??s photo storage. OSDI, 2010: 47-60
4 Biplob D , Sengupta S , Li J . et al. . FlashStore: high throughput persistent key-value store. 2010, Proc. VLDB Endowment: 1414 - 1425
5 Bisciglia C, Hadoop HA Configuration, 2009, Available from: http://www.cloudera.com/blog/2009/07/22/hadoop-haconfiguration/
6 Braam RZPJ . Lustre: a Scalable, High Performance File System, 2007, Inc, Cluster File Systems
7 Brandt SA , Miller EL , Long DDE . et al. . Efficient metadata management in large distributed storage systems. IEEE Symp. on Mass Storage Systems. 2003: 290 - 298
8 Cao Y , Chen C , Guo F . et al. . Es2: a cloud data storage system for supporting both OLTP and OLAP. 2011, Proc. IEEE ICDE: 291 - 302
9 Corbett PF , Feitelson DG . The Vesta parallel file system. ACM Trans. Comput. Syst, 1996, 14(3): 225 - 264 doi: 10.1145/233557.233558
doi: 10.1145/233557.233558
10 DeCandia G , Hastorun D , Jampani M . et al. . Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 2007, 41(6): 205 - 220
11 Dev D , Patgiri R . Performance evaluation of HDFS in big data management. Int. Conf. on High Performance Computing and Applications. 2014: 1 - 7
12 Dev D , Patgiri R . HAR+: archive and metadata distribution! Why not both?. 2015, ICCCI (in press)
13 Escriva R , Wong B , Sirer EG . et al. . HyperDex: a distributed, searchable key-value store. ACM SIGCOMM Comput. Commun. Rev 2012, 42(4): 25 - 36 doi: 10.1145/2377677.2377681
doi: 10.1145/2377677.2377681
14 Fred H , McNab R . SimJava: a discrete event simulation library for Java . Simul. Ser. 1998, 30: 51 - 56
15 Ghemawat S , Gobioff H , Leung ST . et al. . The Google file system. 2003 Proc. 19th ACM Symp. on Operating Systems Principles: 29 - 43
16 Haddad IF . Pvfs: a parallel virtual file system for Linux clusters. Linux J. 2000: 5
17 Wiki, NameNode Failover, on Wiki Apache Hadoop.2012.Available from:http://wiki.apache.org/hadoop/NameNodeFailover
18 HDFS,Hadoop AvatarNode High Availability. 2010, Available from:http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
19 Karger D , Lehman E , Leighton F . et al. . Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. 1997, Proc. 29th Annual ACM Symp. on Theory of Computing: 654 - 663
20 Kavalanekar S , Worthington BL , Zhang Q . et al. . Characterization of storage workload traces from production Windows Servers. 2008, Proc. IEEE IISWC: 119 - 128
21 Lewin D . Consistent hashing and random trees: algorithms for caching in distributed networks. Master Thesis, Department of EECS, MIT. 1998
22 Lim H , Fan B , Andersen DG . et al. . SILT: a memory-efficient, high-performance key-value store. 2011, Proc. 23rd ACM Symp. on Operating Systems Principles.
23 McKusick MK , Quinlan S . GFS: evolution on fastforward. ACM Queue, 2009, 7(7): 10 - 20 doi: 10.1145/1594204.1594206
doi: 10.1145/1594204.1594206
24 Miller EL , Katz RH . Rama: an easy-to-use, highperformance parallel file system. Parall. Comput. 1997. 23(4-5): 419 - 446 doi: 10.1016/S0167-8191(97)00008-2
doi: 10.1016/S0167-8191(97)00008-2
25 Miller E.L., Greenan K., Leung A. Reliable and efficient metadata storage and indexing using nvram.. 2008, Available from: dcslab.hanyang.ac.kr/nvramos08/EthanMiller.pdf.
26 Nagle D , Serenyi D , Matthews A . The Panasas activescale storage cluster-delivering scalable high bandwidth storage.. 2004, Proc. ACM/IEEE SC : 1 - 10
27 Okorafor E , Patrick MK . Availability of Jobtracker machine in hadoop/mapreduce zookeeper coordinated clusters. Adv. Comput, 2012, 3(3): 19 - 30 doi: 10.5121/acij.2012.3302
doi: 10.5121/acij.2012.3302
28 Ousterhout JK , Costa HD , Harrison D . et al. . A trace-driven analysis of the Unix 4.2 BSD file system. 1985, SOSP: 15 - 24
29 Raicu I , Foste IT , Beckman P . Making a case for distributed file systems at exascale. 2011, Proc. 3rd Int. Workshop on Large-Scale System and Application Performance: 11 - 18 doi: 10.1145/1996029.1996034
doi: 10.1145/1996029.1996034
30 Rodeh O , Teperman A . ZFS—a scalable distributed file system using object disks. 2003, IEEE Symp. on Mass Storage Systems : 207 - 218
31 Satyanarayanan M , Kistler JJ , Kumar P . et al. . Coda: a highly available file system for a distributed workstation environment. IEEE Trans. Comput, 1990, 39(4): 447 - 459 doi: 10.1109/12.54838
doi: 10.1109/12.54838
32 Shvachko K , Kuang HR , Radia S . et al.. The Hadoop Distributed File System. 2010, IEEE 26th Symp. on Mass Storage Systems and Technologies: 1 - 10
33 Torodanhan, Best Practice: DB2 High Availability Disaster Recovery2009, Available from:http://www.ibm.com/developerworks/wikis/display/data/Best+Practice+-+DB2+High+Availability+Disaster+Recovery.
34 U.S Department of Commerce/NIST, 1995, VA FIPS 180-1. Secure Hash Standard. National Technical Information Service, Springfield
35 Wang F , Qiu J , Yang J . Hadoop high availability through metadata replication. 2009, Proc. 1st Int. Workshop on Cloud Data Management: 37 - 44 doi: 10.1145/1651263.1651271
doi: 10.1145/1651263.1651271
36 Weil SA , Pollack KT , Brandt SA . et al. . Dynamic metadata management for petabyte-scale file systems. 2004, SC: 47
37 Weil SA , Brandt SA , Miller EL . et al. . CEPH: a scalable, high-performance distributed file system. 2006, OSDI: 307 - 320
38 White T . Hadoop: the Definitive Guide. 2009, O’Reilly Media, Inc.
39 White BS , Walker M , Humphrey M . et al. . Legionfs: a secure and scalable file system supporting cross-domain highperformance applications. 2001, Proc. ACM/IEEE Conf. on Supercomputing: 59
40 Yadava H . The Berkeley DB Book. 2007, (Apress)
[1] Deng Chen, Yan-duo Zhang, Wei Wei, Shi-xun Wang, Ru-bing Huang, Xiao-lin Li, Bin-bin Qu, Sheng Jiang. 基于改进规则检查静态分析技术的高效脆弱性检测方法[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(3): 332-345.
[2] Long-xiang Wang, Xiao-she Dong, Xing-jun Zhang, Yin-feng Wang, Tao Ju, Guo-fu Feng. TextGen:用于新型存储系统基准测试的真实文本数据集生成方法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(10): 982-993.
[3] Shahab Pourtalebi, Imre Horváth. 用于定义系统表现特征的基因型与表型仓库数据库的信息图式构造方法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 862-884.
[4] Saif Ur Rehman Khan, Sai Peck Lee, Mohammad Dabbagh, Muhammad Tahir, Muzafar Khan, Muhammad Arif. RePizer:一种软件需求排序架构[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(8): 750-765.
[5] Mohammad Alshayeb, Nasser Khashan, Sajjad Mahmood. 一种集成的统一建模语言框架[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 17(2): 143-159.
[6] Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan. 基于共同共现群体相似度的社会化标签聚类方法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(2): 122-134.
[7] Ignacio Marin, Francisco Ortin, German Pedrosa, Javier Rodriguez. 使用模型变换为多种终端生成原生用户界面[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(12): 995-1017.
[8] Hong Yin, Shu-qiang Yang, Xiao-qian Zhu, Shao-dong Ma, Lu-min Zhang. 基于趋势特征的时间序列符号化方法[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(9): 744-758.
[9] Ping Xie, Jian-zhong Huang, Er-wei Dai, Qiang Cao, Chang-sheng Xie. 一种负载平衡的RAID-6存储方案[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 335-345.
[10] Xiao-xia Zhang, Qiang-hua Xiao, Bin Li, Sai Hu, Hui-jun Xiong, Bi-hai Zhao. OMMR:一种关键模块重叠部分评价指标[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(4): 293-300.
[11] Yu-xiang Li, Yin-liang Zhao, Bin Liu, Shuo Ji. 基于人工免疫算法的推测多线程线程划分参数的优化[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(3): 205-216.
[12] László Lengyel, Hassan Charaf. 测试驱动的模式转换检验/认证[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(2): 85-97.
[13] Alireza Parvizi-Mosaed, Shahrouz Moaven, Jafar Habibi, Ghazaleh Beigi, Mahdieh Naser-Shariat. 基于扩展型服务导向建模与应用(SOMA)的一种自适应服务导向方法[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(1): 43-69.
[14] Zi-ying Dai, Xiao-guang Mao, Li-qian Chen, Yan Lei. Automatic recovery from resource exhaustion exceptions by collecting leaked resources[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(8): 622-635.
[15] Juan J. Cuadrado-Gallego, Alain Abran, Pablo Rodriguez-Soria, Miguel A. Lara. An experimental study on the conversion between IFPUG and UCP functional size measurement units[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(3): 161-173.