Please wait a minute...
J4  2013, Vol. 47 Issue (1): 8-14    DOI: 10.3785/j.issn.1008-973X.2013.01.002
    
Cloud storage system for small file based on P2P
ZHANG Qi-fei1, ZHANG Wei-dong2, LI Wen-juan1,3, PAN Xue-zeng1, SHEN Yan1
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; 2. College of
Software, Zhejiang University, Ningbo 315103, China;3. College of Qianjiang, Hangzhou Normal University,
Hangzhou 310036, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A novel distributed cloud storage system based on P2P was proposed aiming at the excessive delay problem in the distributed file system with Master/Slaver structure when manipulating the small files. The resource query time complexity was reduced to O(l) by improving the Chord routing algorithm and adding a central routing node, which stored all nodes’ status and routing information. Furthermore, clients can pre-fetch the data on central routing node, so the time overhead was further reduced. A backup strategy was proposed to ensure the reliability of the data, and the replica number was 3. In this system, a serial of basic functions are implemented, such as write, read, delete, list directory, et.al.. Experimental results show that the time manipulating the small size files reduces an order of magnitude compared with Hadoop HDFS.



Published: 01 January 2013
CLC:  TP 338.8  
Cite this article:

ZHANG Qi-fei, ZHANG Wei-dong, LI Wen-juan, PAN Xue-zeng, SHEN Yan. Cloud storage system for small file based on P2P. J4, 2013, 47(1): 8-14.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2013.01.002     OR     http://www.zjujournals.com/eng/Y2013/V47/I1/8


基于对等网络的面向小文件的云存储系统

针对目前主从结构的云存储系统在存储小文件延迟过大的问题,提出基于对等网络(P2P)的分布式云存储系统.通过改进Chord路由算法提高了资源的查询效率,在系统中引入中心路由节点,中心路由节点上存储系统中所有节点的路由和状态信息,使资源查询时间复杂度缩短到O(l),客户端预取中心路由节点数据,从而减少数据操作时的时间开销;系统通过备份的策略来保证数据的可靠性,实现中数据备份数为3;系统实现了文件存储、读取、删除及列目录等基本操作功能.实验结果表明,与Hadoop HDFS文件系统相比,该系统的小文件操作时间减少了一个数量级.

[1] PATASCALE DATA STORAGE INSTITUTE. NERSC file system statistics [EB/OL]. (2007-11-11). http:∥pdsi.nersc.gov/filesystem.htm.
[2] FELIX E. Environmental molecular sciences laboratory: static survey of file system statistics [EB/OL]. [2011-02-23]. http:∥www.pdsi-scidac.org/fsstats/index.html.
[3] CHERVENAK A, SCHOPF J M, PEARLMAN L, et al. Monitoring the earth system grid with MDS4 [C]∥ e-Science and Grid Computing. New York: IEEE, 2006: 69.
[4] NEILSEN E H. The sloan digital sky survey data archive server [J]. Computing in Science Engineering, 2008, 10(1): 13-17.
[5] BONFIELD J K, STADEN R. ZTR: a new format for DNA sequence trace data [J]. Bioinformatics, 2002, 18(1): 3-10.
[6] SHAIKH F, CHAINANI M. A case for small file packing in parallel virtual file system. [EB/OL]. (2007-07-07). http:∥www.andrew.cmu.edu/user/mchainan/FinalPaper.pdf.
[7] CARNS P, LANG S, ROSS R, et al. Small-file access in parallel file systems [C]∥ International Parallel and Distributed Processing Symposium. New York: IEEE, 2009: 1-11.
[8] HENDRICKS J, SAMBASIVAN R R, SINNAMOHIDEEN S, et al. Improving small file performance in objectbased storage [EB/OL]. (2006-05-01). http:∥www.pdl.cmu.edu/PDL-FTP/Storage/CMU-PDL-06-104.pdf.
[9] KUHN M, KUNKEL J M, LUDWIG T. Dynamic file system semantics to enable metadata optimizations in PVFS [J]. Concurrency and Computation: Practice and Experience, 2009, 21(14): 1775-1788.
[10] MACKEY G, SEHRISH S, JUN W. Improving metadata management for small files in HDFS [C]∥ Cluster Computing and Workshops. New York: IEEE, 2009: 1-4.
[11] GHEMAWAT S, GOBIOFF H, LEUNG S. The Google file system [C]∥ Symposium on Operating Systems Principles. New York: ACM, 2003: 29-43.

No related articles found!