Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2016, Vol. 17 Issue (1): 15-31    DOI: 10.1631/FITEE.1500015
Original article     
Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal
Dipayan DEV(),Ripon PATGIRI()
Department of Computer Science and Engineering, NIT Silchar, India
Download: HTML     PDF(1278KB)
Export: BibTeX | EndNote (RIS)      

Abstract  

In this Exa byte scale era, data increases at an exponential rate. This is in turn generating a massive amount of metadata in the file system. Hadoop is the most widely used framework to deal with big data. Due to this growth of huge amount of metadata, however, the efficiency of Hadoop is questioned numerous times by many researchers. Therefore, it is essential to create an efficient and scalable metadata management for Hadoop. Hash-based mapping and subtree partitioning are suitable in distributed metadata management schemes. Subtree partitioning does not uniformly distribute workload among the metadata servers, and metadata needs to be migrated to keep the load roughly balanced. Hash-based mapping suffers from a constraint on the locality of metadata, though it uniformly distributes the load among NameNodes, which are the metadata servers of Hadoop. In this paper, we present a circular metadata management mechanism named dynamic circular metadata splitting (DCMS). DCMS preserves metadata locality using consistent hashing and locality-preserving hashing, keeps replicated metadata for excellent reliability, and dynamically distributes metadata among the NameNodes to keep load balancing. NameNode is a centralized heart of the Hadoop. Keeping the directory tree of all files, failure of which causes the single point of failure (SPOF). DCMS removes Hadoop’s SPOF and provides an efficient and scalable metadata management. The new framework is named ‘Dr. Hadoop’ after the name of the authors.



Key wordsHadoop      NameNode      Metadata      Locality-preserving hashing      Consistent hashing     
Received: 12 January 2015      Published: 05 January 2016
CLC:  TP311  
Corresponding Authors: Dipayan DEV     E-mail: dev.dipayan16@gmail.com;ripon@cse.nits.ac.in
Cite this article:

Dipayan DEV,Ripon PATGIRI. Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal. Front. Inform. Technol. Electron. Eng., 2016, 17(1): 15-31.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/FITEE.1500015     OR     http://www.zjujournals.com/xueshu/fitee/Y2016/V17/I1/15


Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal

In this Exa byte scale era, data increases at an exponential rate. This is in turn generating a massive amount of metadata in the file system. Hadoop is the most widely used framework to deal with big data. Due to this growth of huge amount of metadata, however, the efficiency of Hadoop is questioned numerous times by many researchers. Therefore, it is essential to create an efficient and scalable metadata management for Hadoop. Hash-based mapping and subtree partitioning are suitable in distributed metadata management schemes. Subtree partitioning does not uniformly distribute workload among the metadata servers, and metadata needs to be migrated to keep the load roughly balanced. Hash-based mapping suffers from a constraint on the locality of metadata, though it uniformly distributes the load among NameNodes, which are the metadata servers of Hadoop. In this paper, we present a circular metadata management mechanism named dynamic circular metadata splitting (DCMS). DCMS preserves metadata locality using consistent hashing and locality-preserving hashing, keeps replicated metadata for excellent reliability, and dynamically distributes metadata among the NameNodes to keep load balancing. NameNode is a centralized heart of the Hadoop. Keeping the directory tree of all files, failure of which causes the single point of failure (SPOF). DCMS removes Hadoop’s SPOF and provides an efficient and scalable metadata management. The new framework is named ‘Dr. Hadoop’ after the name of the authors.

Fig. 1 The skeleton of a NameNode server in DCMS
Fig. 2 Key-value system. Key is SHA1 (pathname), while the value is its corresponding metadata
Fig. 3 System architecture. Physical NameNode servers compose a metadata cluster to form a DCMS overlay network /* File Information mapping of: file or directory path - nodeURL */ public static Hashtable<FilePath, MetaData> fileInfo = new Hashtable<FilePath, MetaData>(); /* Cluster information mapping of: nodeURL and ClusterInfo object */ public static Hashtable<String, ClusterInfo> clusterinfo = new Hashtable<String, ClusterInfo>(); /* Data structure for storing all the NameNode servers */ public Hashtable<NameNode_hostname, Namenode-URL>namenode = new Hashtable<NameNode_hostname, Namenode-URL>();
Fig. 4 Replication management of Dr. Hadoop in DCMS
Algorithm 1 Write operation of Dr. Hadoop in DCMS
Algorithm 2 Read operation of Dr. Hadoop in DCMS
Fig. 5 Client accessing files on the Dr. Hadoop framework
Fig. 6 Dynamic nature of Dr. Hadoop: (a) initial setup of DCMS with half filled RAM; (b) addition of NameNode in DCMS when RAM gets filled up
Algorithm 3 Node join operation of Dr. Hadoop: addnew_NameNode_N()
Parameter Traditional Hadoop Dr. Hadoop
Maximum number of NameNode crushes that can survive0r — 1
Number of RPCs needed for read operation11
Number of RPCs needed for write operation1r
Metadata storage per NameNodeX(X/r)m
Throughput of metadata readXXm
Throughput of metadata writeXX (m/r)
Table 1 Analytical comparison of traditional Hadoop and Dr. Hadoop
Trace Number of
files
Data size
(GB)
Metadata extracted
(MB)
Yahoo 8 139 723 256.8 596.4
Microsoft 7 725 928 223.7 416.2
Table 2 Real data traces
Fig. 7 Locality comparisons of paths at three levels over two traces in the cluster with 10 NameNodes: (a) Yahoo trace; (b) Microsoft Windows trace
Fig. 8 Linear growth of metadata for Hadoop and Dr. Hadoop load per NameNode: (a) Microsoft Windows trace; (b) Yahoo trace
Data
(GB)
namespace
(MB)
1029.80
2053.13
3077.92
4096.18
50119.02
60142.11
70159.17
80181.67
90203.09
100229.37
110251.04
120279.30
130299.82
140318.33
150337.22
160356.71
170373.18
180394.22
190426.13
200454.76
210481.01
220512.16
230536.92
240558.23
250570.12
256.8594.40
Table 3 Incremental data storage vs. namespace size for the traditional Hadoop cluster (Yahoo trace)
Data
(GB)
namespace
(MB)
1020.50
2041.83
3059.03
4077.08
50103.07
60128.18
70141.90
80157.19
90174.34
100190.18
110214.20
120237.43
130254.12
140271.89
150297.12
160312.71
170329.11
180352.12
190369.77
200384.76
210393.04
220401.86
223.7416.02
Table 4 Incremental data storage vs. namespace size for the traditional Hadoop cluster (Microsoft trace)
Data
(GB)
Namespace
(MB)
YahooMicrosoft
100684.11572.65
2001364.901152.89
256.81789.341248.02
Table 5 Data storage vs. namespace size for the Dr. Hadoop cluster
Fig. 9 Comparison of throughput under different load conditions: (a) Yahoo trace; (b) Microsoft Windows trace
Fig. 10 Fault tolerance of Hadoop and Dr. Hadoop: (a) Yahoo trace; (b) Microsoft Windows trace
Fig. 11 Migration overhead
Fig. 12 Wordcount job execution time with different dataset sizes
[1]   Aguilera MK , Chen W , Toueg S . et al. . Heartbeat: a timeoutfree failure detector for quiescent reliable communication. 1997 Proc. 11th Int. Workshop on Distributed Algorithms: 126 - 140 doi: 10.1007/BFb0030680
doi: 10.1007/BFb0030680
[2]   Apache Software Foundation, Hot Standby for NameNode, 2012, Available from: http://issues.apache.org/jira/browse/HDFS-976
[3]   Beaver D , Kumar S , Li HC . et al. . Finding aneedle in haystack: Facebook??s photo storage. OSDI, 2010: 47-60
[4]   Biplob D , Sengupta S , Li J . et al. . FlashStore: high throughput persistent key-value store. 2010, Proc. VLDB Endowment: 1414 - 1425
[5]   Bisciglia C, Hadoop HA Configuration, 2009, Available from: http://www.cloudera.com/blog/2009/07/22/hadoop-haconfiguration/
[6]   Braam RZPJ . Lustre: a Scalable, High Performance File System, 2007, Inc, Cluster File Systems
[7]   Brandt SA , Miller EL , Long DDE . et al. . Efficient metadata management in large distributed storage systems. IEEE Symp. on Mass Storage Systems. 2003: 290 - 298
[8]   Cao Y , Chen C , Guo F . et al. . Es2: a cloud data storage system for supporting both OLTP and OLAP. 2011, Proc. IEEE ICDE: 291 - 302
[9]   Corbett PF , Feitelson DG . The Vesta parallel file system. ACM Trans. Comput. Syst, 1996, 14(3): 225 - 264 doi: 10.1145/233557.233558
doi: 10.1145/233557.233558
[10]   DeCandia G , Hastorun D , Jampani M . et al. . Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 2007, 41(6): 205 - 220
[11]   Dev D , Patgiri R . Performance evaluation of HDFS in big data management. Int. Conf. on High Performance Computing and Applications. 2014: 1 - 7
[12]   Dev D , Patgiri R . HAR+: archive and metadata distribution! Why not both?. 2015, ICCCI (in press)
[13]   Escriva R , Wong B , Sirer EG . et al. . HyperDex: a distributed, searchable key-value store. ACM SIGCOMM Comput. Commun. Rev 2012, 42(4): 25 - 36 doi: 10.1145/2377677.2377681
doi: 10.1145/2377677.2377681
[14]   Fred H , McNab R . SimJava: a discrete event simulation library for Java . Simul. Ser. 1998, 30: 51 - 56
[15]   Ghemawat S , Gobioff H , Leung ST . et al. . The Google file system. 2003 Proc. 19th ACM Symp. on Operating Systems Principles: 29 - 43
[16]   Haddad IF . Pvfs: a parallel virtual file system for Linux clusters. Linux J. 2000: 5
[17]   Wiki, NameNode Failover, on Wiki Apache Hadoop.2012.Available from:http://wiki.apache.org/hadoop/NameNodeFailover
[18]   HDFS,Hadoop AvatarNode High Availability. 2010, Available from:http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
[19]   Karger D , Lehman E , Leighton F . et al. . Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. 1997, Proc. 29th Annual ACM Symp. on Theory of Computing: 654 - 663
[20]   Kavalanekar S , Worthington BL , Zhang Q . et al. . Characterization of storage workload traces from production Windows Servers. 2008, Proc. IEEE IISWC: 119 - 128
[21]   Lewin D . Consistent hashing and random trees: algorithms for caching in distributed networks. Master Thesis, Department of EECS, MIT. 1998
[22]   Lim H , Fan B , Andersen DG . et al. . SILT: a memory-efficient, high-performance key-value store. 2011, Proc. 23rd ACM Symp. on Operating Systems Principles.
[23]   McKusick MK , Quinlan S . GFS: evolution on fastforward. ACM Queue, 2009, 7(7): 10 - 20 doi: 10.1145/1594204.1594206
doi: 10.1145/1594204.1594206
[24]   Miller EL , Katz RH . Rama: an easy-to-use, highperformance parallel file system. Parall. Comput. 1997. 23(4-5): 419 - 446 doi: 10.1016/S0167-8191(97)00008-2
doi: 10.1016/S0167-8191(97)00008-2
[25]   Miller E.L., Greenan K., Leung A. Reliable and efficient metadata storage and indexing using nvram.. 2008, Available from: dcslab.hanyang.ac.kr/nvramos08/EthanMiller.pdf.
[26]   Nagle D , Serenyi D , Matthews A . The Panasas activescale storage cluster-delivering scalable high bandwidth storage.. 2004, Proc. ACM/IEEE SC : 1 - 10
[27]   Okorafor E , Patrick MK . Availability of Jobtracker machine in hadoop/mapreduce zookeeper coordinated clusters. Adv. Comput, 2012, 3(3): 19 - 30 doi: 10.5121/acij.2012.3302
doi: 10.5121/acij.2012.3302
[28]   Ousterhout JK , Costa HD , Harrison D . et al. . A trace-driven analysis of the Unix 4.2 BSD file system. 1985, SOSP: 15 - 24
[29]   Raicu I , Foste IT , Beckman P . Making a case for distributed file systems at exascale. 2011, Proc. 3rd Int. Workshop on Large-Scale System and Application Performance: 11 - 18 doi: 10.1145/1996029.1996034
doi: 10.1145/1996029.1996034
[30]   Rodeh O , Teperman A . ZFS—a scalable distributed file system using object disks. 2003, IEEE Symp. on Mass Storage Systems : 207 - 218
[31]   Satyanarayanan M , Kistler JJ , Kumar P . et al. . Coda: a highly available file system for a distributed workstation environment. IEEE Trans. Comput, 1990, 39(4): 447 - 459 doi: 10.1109/12.54838
doi: 10.1109/12.54838
[32]   Shvachko K , Kuang HR , Radia S . et al.. The Hadoop Distributed File System. 2010, IEEE 26th Symp. on Mass Storage Systems and Technologies: 1 - 10
[33]   Torodanhan, Best Practice: DB2 High Availability Disaster Recovery2009, Available from:http://www.ibm.com/developerworks/wikis/display/data/Best+Practice+-+DB2+High+Availability+Disaster+Recovery.
[34]   U.S Department of Commerce/NIST, 1995, VA FIPS 180-1. Secure Hash Standard. National Technical Information Service, Springfield
[35]   Wang F , Qiu J , Yang J . Hadoop high availability through metadata replication. 2009, Proc. 1st Int. Workshop on Cloud Data Management: 37 - 44 doi: 10.1145/1651263.1651271
doi: 10.1145/1651263.1651271
[36]   Weil SA , Pollack KT , Brandt SA . et al. . Dynamic metadata management for petabyte-scale file systems. 2004, SC: 47
[37]   Weil SA , Brandt SA , Miller EL . et al. . CEPH: a scalable, high-performance distributed file system. 2006, OSDI: 307 - 320
[38]   White T . Hadoop: the Definitive Guide. 2009, O’Reilly Media, Inc.
[39]   White BS , Walker M , Humphrey M . et al. . Legionfs: a secure and scalable file system supporting cross-domain highperformance applications. 2001, Proc. ACM/IEEE Conf. on Supercomputing: 59
[40]   Yadava H . The Berkeley DB Book. 2007, (Apress)
[1] Li Weigang, Edans F. O. Sandes, Jianya Zheng, Alba C. M. A. de Melo, Lorna Uden. Querying dynamic communities in online social networks[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(2): 81-90.
[2] Biligsaikhan Batjargal, Fuminori Kimura, Akira Maeda. Providing universal access to Japanese humanities digital libraries: an approach to federated searching system using automatic metadata mapping[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(11): 837-843.