Please wait a minute...
Research overview of big data technology
LIU Zhi-hui, ZHANG Quan-ling
Institute of Cyber-systems and Control, Zhejiang University, Hangzhou 310027, China
Download:   PDF(2921KB) HTML
Export: BibTeX | EndNote (RIS)      


Abstract: The emergence of “big data” has brought new challenges to mass information processing technology. This comprehensive overview was intended to elaborate on big data from three aspects: the concept and characteristics, general data processing framework and key techniques. The background of big data was explained, and the basic concepts, typical 4“V” characteristics as well as related application fields were sketched. Then, the general procedures of big data processing were summarized, and fundamental analysis and description of the key techniques, such as MapReduce, GFS, BigTable, Hadoop and data visualization, were given as well. Finally, the new issues and challenges in the Big Data Era were pointed out.

Published: 01 April 2015
CLC:  TP 391  
  TP 311  
Cite this article:

LIU Zhi-hui, ZHANG Quan-ling. Research overview of big data technology. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(6): 957-972.

URL:     OR



[1] NAISBITT J. Megatrends:Ten new directions transforming our live[M]. New York:Warner Books, 1982: 40-42.
[2] 阿尔文·托勒夫.第三次浪潮[M].黄明坚译.北京:中信出版社, 2006: 19-25.
[3] GOLDSTON D. Big data: data wrangling [J/OL]. Nature, 2008, 455: 15. [2013-07-24]. http:∥
[4] REICHMAN O J, MATTHEW B, MARK P H, et al. Challenges and opportunities of open data in ecology [J/OL]. Science, 2011, 311(6018): 703705. [2013-07-23]. http:∥
[5] MANYIKA J, CHUI M, BROWN B, et al. Big data: The next frontier for innovation, competition, and productivity[R/OL]. Las Vegas: The McKinsey Global Institute. [2013-07-24]. http:∥
[6] World Economic Forum. Big data,big impact: New possibilities for international development[EB/OL]. [2013-07-24]. http:∥
[7] Office of Science and Technology Policy Executive, Office of the President. Obama administration unveils “Big Data” initiative: Announces $200 million in new R&D investments[EB/OL]. [2013-07-24]. http:∥
[8] Leading researchers across the United States. Challenges and Opportunities with Big Data[R/OL]. New York: United Nations. [2013-07-24]. http:∥
[9] IDC. 中国互联网市场洞见:互联网大数据技术创新研究[R/OL]. Beijing: IDC国际数据公司. [2013-07-24]. http:∥
[10] Executive Office of the President. Designing a future: Federally funded research and development in network and information technology[R]. New York: Executive Office of the President, 2010, 10.
[11] 黄哲学,曹付元,李俊杰,等.面向大数据的海云数据系统关键技术研究[J].网络新媒体技术, 2012, 1(6): 20-26.
HUANG Zhe-xue, CAO Fu-yuan, LI Jun-jie, et al. A research of sea-cloud data system key technology for the big data[J]. Microcomputer Applications, 2012, 1(6): 20-26.
[12] 工业和信息化部.《物联网“十二五”发展规划》发布[EB/OL]. [2012-02-14]. http:∥
[13] 李国杰,华云生.网络数据科学与工程———门新兴的交叉学科?香山科学会议第424学术讨论会综述[R/OL].北京:香山学术会议. [2012-10-12]. http:∥!detailfp.actionid=1697.
[14] 天津大学. 863项目“面向大数据的先进存储结构及关键技术”启动会[EB/OL]. [2013-04-01]. http:∥
[15] 于艳华,宋美娜. 大数据[J]. 中兴通讯技术,2013(1): 57-60.
YU Yan-hua, SONG Mei-na. Big data[J]. ZTE Communication, 2013(1): 57-60.
[16] 吴吉义,傅建庆,张明西,等. 云数据管理研究综述[J]. 电信科学,2010(5): 34-41.
WU Ji-yi,FU Jian-qing,ZHANG Ming-xi,et al. Cloud data management: a survey[J]. Telecommunications Science, 2010(5): 34-41.
[17] 张意轩,于洋.人民日报:大数据时代的大媒体[EB/OL].[2013-01-17]. http:∥
[18] ESnet. Network introducing ESnet5: The fifth generation of the energy sciences network a new 100 gigabit per second nationwide platform for science discovery[EB/OL]. [2013-07-24]. http:∥
[19] 孙其博,刘杰,黎羴,等.物联网:概念、架构与关键技术研究综述[J].北京邮电大学学报,2010, 33(3):19.
SUN Qi-bo,LIU Jie,LI Shan,et al. Internet of things: Summarize on concepts,architecture and key technology problem[J]. Journal of Beijing University of Posts and Telecommunications, 2010, 33(3): 19.
[20] 沈苏彬,毛燕琴,范曲立,等.物联网概念模型与体系结构[J]. 南京邮电大学学报:自然科学版, 2010, 30(4): 18.
SHEN Su-bin, MAO Yan-qin, FAN Qu-li, et al. The concept model and ararchitecture of the Internet of things[J]. Journal of Nanjing University of Posts and Telecommunications: Natural Science, 2010,30(4): 18.
[21] 李国杰,程学旗.大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,27(6): 647-657.
LI Guo-jie,CHENG Xue-qi. Research status and scientific thinking of big data[J]. Bulletin of Chinese Academy of Sciences, 2012,27(6): 647-657.
[22] 余长慧,潘和平.商业智能及其核心技术[J].计算机应用研究,2002(9): 1416, 26.
YU Chang-hui,PAN He-ping. Bussiness intelligence and it’s key technology[J]. Application Research of Computers, 2002(9): 14-16, 26.
[23] 熊忠阳.面向商业智能的并行数据挖掘技术及应用研究[D].重庆: 重庆大学, 2004.
XIONG Zhong-yang. Research on parallel data mining and applicationg for business intelligence[D]. Chongqing: Chongqing University, 2004.
[24] 涂子沛.大数据[M].桂林:广西师范大学出版社,2012: 5458.
[25] TONY H,STEWARD T,KRISTIN T. 第四范式:数据密集型科学发现[M].潘教峰,等译. 北京:科学出版社,2012: 15-19.
[26] 人民日报.大数据成信息技术领域热门概念[EB/OL].[2012-02-22]. http:∥
[27] 中国信息产业网.大数据的四个典型特征[EB/OL].[2012-12-04].http:∥
[28] HAMISH B. IIIS: The ′four Vs′ of big data[EB/OL]. [2013-07-24]. http:∥
[29] 严霄凤,张德馨.大数据研究[J].计算机技术与发展, 2013, 23(4): 168-172.
YAN Xiao-feng,ZHANG De-xin. Big data research[J]. Computer Technology and Development, 2013, 23(4): 168-172.
[30] 陈如明.大数据时代的挑战、价值与应对策略[J].移动通信,2012(17): 14-15.
CHEN Ru-ming. Challenges, values and countermeasures of the era of big data[J]. Mobile Communication, 2012(17): 14-15.
[31] 高勇.啤酒与尿布[M].北京: 清华大学出版社, 2008: 16.
[32] 冯海超.大数据的中国机会[EB/OL].[2012-07-24]. http:∥
[33] 维克托·迈尔-舍恩伯格,肯尼斯·库克耶.大数据时代[M].盛杨燕,等译.杭州:浙江人民出版社,2013: 54-58.
[34] 步国军.医疗信息系统数据整合和数据挖掘研究[D].北京:北京工业大学,2010.
BU Guo-jun.Data integration and data mining research in the medical information system [D]. Beijing:Beijing University of Technology, 2010.
[35] 工业和信息化部.医疗发展“十二五”规划[EB/OL].[2013-07-24]. http:∥
[36] ZDNet知识信息管理.制造业的大数据时代[EB/OL].[2012-07-09]. http:∥
[37] 苏畅.制造业三层面优化大数据[EB/OL].[2012-10-08]. http:∥
[38] Lab of Web and Mobile Data Management. WAMDM Homepage[EB/OL]. [2013-07-24]. http:∥
[39] 孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1): 146-169.
MENG Xiao-feng,CI Xiang. Big data management: concepts, techniques and challenges[J]. Journal of Computer Research and Development, 2013, 50(1): 146-169.
[40] LM Ni, YLIU, YC Lau, et al. LANDMARC: Indoor location sensing using active RFID[J]. Wireless Networks, 2004, 10(6): 701-710.
[41] 李乔,郑啸.云计算研究现状综述[J].计算机科学,2011,38(4): 32-37.
LI Qiao, ZHENG Xiao. Research survey of cloud computing[J]. Computer Science,2011,38(4): 32-37.
[42] GHEMAWAT S, GOBIOFF H, LEUNG S T. The google file system[J]. ACM SIGOPS Operating Systems Review, 2003,37(5): 29-43.
[43] CHANG F, DEAN J, GHEMAWAT S, et al. BigTable: A distributed storage system for structured data[J]. ACM Transactions on Computer Systems, 2008,26(2): 4.
[44] DEAN J, GHEMAWAT S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM 51, 2008(1): 107-113.
[45] 杨宸铸.基于HADOOP的数据挖掘研究[D].重庆:重庆大学,2010.
YANG Chen-zhu. The research of data mining based on HADOOP[D]. Chongqing:Chongqing Universicy, 2010.
[46] 贺全兵.可视化技术的发展及应用[J].中国西部科技,2008,7(4): 47.
HE Quan-bing. The development and application of Visualization technique[J]. Science and Technology of West China, 2008, 7(4): 47.
[47] 刘勘,周晓峥,周洞汝.数据可视化的研究与发展[J].计算机工程,2002,28(8): 12,63.
LIUI Kan, ZHOU Xiao-zheng, ZHOU Dong-ru. Data visualization research and development[J]. Computer Engineering, 2002, 28(8): 12,63.
[48] FOSTER I, ZHAO Y, RAICU I, et al. Cloud computing and grid computing 360-degree compared[C]∥Proceedings of the Grid Computing Environments Workshop 2008GCE’08). Austin: IEEE, 2008: 110.
[49] 维基百科.云计算[EB/OL]. [2013-07-24]. http:∥
[50] 罗军舟,金嘉晖,宋爱波,等.云计算:体系架构与关键技术[J].通信学报,2011,32(7): 321.
LUO Jun-zhou, JIN Jia-hui, SONG Ai-bo, et al. Cloud computing:architecture and key technologies[J]. Journal on Communications, 2011, 32(7): 321.
[51] 陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5): 1337-1348.
CHEN Kang, ZHENG Wei-min. Cloud computing:System instances and current research[J]. Journal of Software, 2009, 20(5): 1337-1348.
[52] 李成华,张新访,金海,等.MapReduce:新型的分布式并行计算编程模型[J].计算机工程与科学,2011, 33(3): 129-135.
LI Cheng-hua, ZHANG Xin-fang,JIN Hai,et al. MapReduce: A new programming model for distributed parallel computing[J]. Computer Engineering And Science, 2011, 33(3): 129-135.
[53] 覃雄派,王会举,杜小勇,等.大数据分析——RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1): 32-45.
QIN Xiong-pai, WANG Hui-ju,DU Xiao-yong, et al. Big data analysis——Competition and symbiosis of RDBMS and MapReduce[J]. Journal of Software, 2012, 23(1): 32-45.
[54] The Apache Software Foundation. HDFS Architecture[EB/OL]. [2013-07-24]. http:∥
[55] SINGH A K. Smart grid cloud[J]. Sensors, 2012, 2(26): 674-704.
[56] MOLINA-ESTOLANO E, GOKHALE M, MALTZAHN C, et al. Mixing Hadoop and HPC workloads on parallel filesystems[C]∥Proceedings of the 4th Annual Workshop (PDSW ′09). New York:ACM,2009: 15.
[57] BEAVER D, KUMAR S, LI H C, ey al. Finding a needle in haystack:facebook′s photo storage[C]∥ Proceedings of OSDI 2010. Berkeley CA:USENIX Association, 2010: 18.
[58] TaoCode. TFS[EB/OL]. [2013-07-24]. http:∥
[59] BURROWS M. The Chubby lock service for loosely-coupled distributed systems[C]∥Proceedings of the 7th Symposium on Operating Systems Design and Implementation 2006. Berkeley:USENIX Association,2006: 335-350.
[60] COOPER B F, RAMAKRISHNAN R, SRIVASTAVA U, et al. PNUTS: Yahoo!′s hosted data serving platform[C]∥ Proceedings of the VLDB Endowment 2008. Auckland:ACM, 2008: 1277-1288.
[61] DECANDIA G, HASTORUN D, JAMPANI M, et al. Dynamo:Amazon′s highly available key-value store[C]∥ Procedings of SOSP 2007. New York:ACM,2007: 205-220.
[62] NoSQL Databases. NoSQL Definition[EB/OL].[2013-07-24]. http:∥
[63] 李方超.基于NOSQL的数据最终一致性策略研究[D].哈尔滨:哈尔滨工程大学,2012.
LI Fang-chao. Research of data eventually consistent strategy based on NOSQL \[D\]. Harbin: Harbin Engineering University, 2012.
[64] 王宏宇.Hadoop平台在云计算中的应用[J].软件,2011,32(4): 3638,50.
WANG Hong-yu. An application of Hadoop platform in cloud computing[J]. Software, 2011,32(4): 3638, 50.
[65] 黄晓云.基于HDFS的云存储服务系统研究[D].大连: 大连海事大学,2010.
HUANG Xiao-yun. Research of cloud storage service system based on HDFS[D]. Dalian:Dalian Maritime University, 2010.
[66] The Apache Software Foundation. Hbase[EB/OL]. [2013-07-24]. http:∥
[67] The Apache Software Foundation. Mahout[EB/OL]. [2013-07-24]. http:∥
[68] The Apache Software Foundation.Hive[EB/OL]. [2013-07-24]. http:∥
[69] The Apache Software Foundation.Pig Latin basics[EB/OL]. [2013-07-24]. http:∥
[70] The Apache Software Foundation.Zookeeper[EB/OL]. [2013-07-24]. http:∥
[71] The Apache Software Foundation.Sqoop[EB/OL]. [2013-07-24]. http:∥
[72] The Apache Software Foundation.Flume[EB/OL]. [2013-07-24]. http:∥
[73] 唐泽圣,陈莉,邓俊辉.三维数据场可视化[M].北京:清华大学出版社,1999: 16.
[74] 王媛媛,丁毅,孙媛媛,等.数据可视化技术的实现方法研究[J].现代电子技术,2007(4): 71-74.
WANG Yuan-yuan,DING Yi, SUN Yuan-yuan, et al. Research on data visualization implementation methods[J]. Modern Electronics Technique, 2007(4): 71-74.
[75] 吴加敏,孙连英,张德政.空间数据可视化的研究与发展[J].计算机工程与应用,2002(10): 85-88.
WU Jia-min, SUN Lian-ying, ZHANG De-zheng. Research and development of spacial data visualization[J]. Computer Engineering and Applications, 2002(10): 85-88.
[76] ENIKEEV R. The Internet Map[EB/OL]. [2013-07-24]. http:∥
[77] KASER O, LEMIRE D. Tag-cloud drawing: algorithms for cloud visualization[J]. Computing Research Repository, 2007, 70: 109-118.
[78] VIGAS F B, WATTENBERG M, DAVE K. Studying cooperation and conflict between authors with history flow visualizations[C]∥Proceedings of the SIGCHI Conference on Human Factors in Computing Systems 2004. New York: ACM,2004: 575-582.
[79] 李凌燕.OLAP系统中多维数据可视化的实现[J].现代电子技术,2007(10): 142-145.
LI Ling-yan. Implementation of multidimensional data visualization in OLAP system[J]. Modern Electronics Technique, 2007(10): 142-145.
[80] 施惠娟,孙蕾,李由.关联规则下数据挖掘可视化技术的探讨与实现[J].计算机与现代化,2010(2): 166-169, 172.
SHI Hui-juan, SUN Lei, LI You. Research and implementation of association rules mining visualization[J]. Computer and Modernization, 2010(2): 166-169, 172.
[81] LINDELL Y,PINKAS B. Privacy preserving data mining[J]. Journal of Cryptology, 2002, 15(3): 177-206.
[82] SWEENEY L. k-Anonymity: A model for protecting privacy[J]. International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems, 2002, 10(5): 557-570.
[83] DWORK C. Differential privacy[C]∥Proceedings of the 33rd International Colloquium, ICALP 2006. Venice: IEEE, 2006, 4052: 112.
[84] ROY I, RAMADAN H E, SETTY S T V,et al. Airavat: Security and privacy for MapReduce[C]∥Proceedings of the 7th usenix symmp. on Networked Systems Design and Implementation. San Jose: USENIX Association, 2010: 297-312.
[85] 郎杨琴,孔丽华.美国发布“大数据的研究和发展计划”[J].科研信息化技术与应用,2012,3(2): 89-93.
LANG Yang-qin,KONG Li-hua. The U. S. Governament released big data research and development initiative \[J\]. E-Science Technology & Application, 2012, 3(2): 89-93.
[86] BREWER E A. Towards robust distributed systems[C]∥ Proceedings of Symposium on Principles of Distributed Computing 2000. New York: ACM,2000.
[87] GLANZ J. Power, pollution and the Internet[N]. The New York Times, 20120920.
[88] 杰里米·里夫金.第三次工业革命:新经济模式如何改变世界[M].张体伟,等译.北京: 中信出版社,2012: 34-67.

[1] HE Xue-jun, WANG Jin, LU Guo-dong, LIU Zhen-yu, CHEN Li, JIN Jing. 3D head portrait sculpture by industrial robot based on triangular mesh slicing and collision detection[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1104-1110.
[2] WANG Hua, HAN Tong-yang, ZHOU Ke. KeyGraph-based community detection algorithm for public security intelligence[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1173-1180.
[3] YOU Hai-hui, MA Zeng-yi, TANG Yi-jun, WANG Yue-lan, ZHENG Lin, YU Zhong, JI Cheng-jun. Soft measurement of heating value of burning municipal solid waste for circulating fluidized bed[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1163-1172.
[4] BI Xiao-jun, WANG Jia-hui. Teaching-learning-based optimization algorithm with hybrid learning strategy[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 1024-1031.
[5] WANG Liang, YU Zhi-wen, GUO Bin. Moving trajectory prediction model based on double layer multi-granularity knowledge discovery[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 669-674.
[6] LIAO Miao, ZHAO Yu-qian, ZENG Ye-zhan, HUANG Zhong-chao, ZHANG Bing-kui, ZOU Bei-ji. Automatic segmentation for cell images based on support vector machine and ellipse fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 722-728.
[7] MU Jing-jing, ZHAO Xin-yue, HE Zai-xing, ZHANG Shu-you. Contour reconstruction of overlapped bubbles based on concave-convex transformation and circle fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 714-721.
[8] HUANG Zheng-yu, JIANG Xin-long, LIU Jun-fa, CHEN Yi-qiang, GU Yang. Fusion feature based semi-supervised manifold localization method[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 655-662.
[9] JIANG Xin-long, CHEN Yi-qiang, LIU Jun-fa, HU Li-sha, SHEN Jian-fei. Wearable system to support proximity awareness for people with autism[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 637-647.
[10] DAI Cai-yan, CHEN Ling, LI Bin, CHEN Bo-lun. Sampling-based link prediction in complex networks[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 554-561.
[11] LIU Lei, YANG Peng, LIU Zuo-jun. Locomotion-Mode recognition using multiple kernel relevance vector machine[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 562-571.
[12] GUO Meng-li, DA Fei-peng, DENG Xing, GAI Shao-yan. 3D face recognition based on keypoints and local feature[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 584-589.
[13] WANG Hai jun, GE Hong juan, ZHANG Sheng yan. Fast object tracking algorithm via kernel collaborative presentation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 399-407.
[14] ZHANG Ya nan, CHEN De yun, WANG Ying jie, LIU Yu peng. Incremental graph pattern matching based dynamic recommendation method for cold-start user[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 408-415.
[15] LIU Yu peng, QIAO Xiu ming, ZHAO Shi lei, MA Chun guang. Deep combination of large-scale features in statistical machine translation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 46-56.