ZHANG Qi-fei1, ZHANG Wei-dong2, LI Wen-juan1,3, PAN Xue-zeng1, SHEN Yan1
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; 2. College of
Software, Zhejiang University, Ningbo 315103, China;3. College of Qianjiang, Hangzhou Normal University,
Hangzhou 310036, China
A novel distributed cloud storage system based on P2P was proposed aiming at the excessive delay problem in the distributed file system with Master/Slaver structure when manipulating the small files. The resource query time complexity was reduced to O(l) by improving the Chord routing algorithm and adding a central routing node, which stored all nodes’ status and routing information. Furthermore, clients can pre-fetch the data on central routing node, so the time overhead was further reduced. A backup strategy was proposed to ensure the reliability of the data, and the replica number was 3. In this system, a serial of basic functions are implemented, such as write, read, delete, list directory, et.al.. Experimental results show that the time manipulating the small size files reduces an order of magnitude compared with Hadoop HDFS.
[1] PATASCALE DATA STORAGE INSTITUTE. NERSC file system statistics [EB/OL]. (2007-11-11). http:∥pdsi.nersc.gov/filesystem.htm.
[2] FELIX E. Environmental molecular sciences laboratory: static survey of file system statistics [EB/OL]. [2011-02-23]. http:∥www.pdsi-scidac.org/fsstats/index.html.
[3] CHERVENAK A, SCHOPF J M, PEARLMAN L, et al. Monitoring the earth system grid with MDS4 [C]∥ e-Science and Grid Computing. New York: IEEE, 2006: 69.
[4] NEILSEN E H. The sloan digital sky survey data archive server [J]. Computing in Science Engineering, 2008, 10(1): 13-17.
[5] BONFIELD J K, STADEN R. ZTR: a new format for DNA sequence trace data [J]. Bioinformatics, 2002, 18(1): 3-10.
[6] SHAIKH F, CHAINANI M. A case for small file packing in parallel virtual file system. [EB/OL]. (2007-07-07). http:∥www.andrew.cmu.edu/user/mchainan/FinalPaper.pdf.
[7] CARNS P, LANG S, ROSS R, et al. Small-file access in parallel file systems [C]∥ International Parallel and Distributed Processing Symposium. New York: IEEE, 2009: 1-11.
[8] HENDRICKS J, SAMBASIVAN R R, SINNAMOHIDEEN S, et al. Improving small file performance in objectbased storage [EB/OL]. (2006-05-01). http:∥www.pdl.cmu.edu/PDL-FTP/Storage/CMU-PDL-06-104.pdf.
[9] KUHN M, KUNKEL J M, LUDWIG T. Dynamic file system semantics to enable metadata optimizations in PVFS [J]. Concurrency and Computation: Practice and Experience, 2009, 21(14): 1775-1788.
[10] MACKEY G, SEHRISH S, JUN W. Improving metadata management for small files in HDFS [C]∥ Cluster Computing and Workshops. New York: IEEE, 2009: 1-4.
[11] GHEMAWAT S, GOBIOFF H, LEUNG S. The Google file system [C]∥ Symposium on Operating Systems Principles. New York: ACM, 2003: 29-43.