Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
Computer Technology     
Cloud resource scheduling algorithm with failure recovery mechanism
QI Ping1,2, LI Long shu1, LI Xue jun1
1. Department of Computer Science and Technology, Anhui University, Hefei 230039, China;2. Department of Mathematics and Computer Science, Tongling University, Tongling 244000, China
Download:   PDF(1883KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A task scheduling model with fault recovery mechanism was proposed aimed at the problem of low reliability in Cloud service. The behavior characteristic of nodes was analyzed by using failure recovery mechanism, and the interaction failures between nodes were classified into two categories, including unrecoverable failures and recoverable failures. A more practical Cloud service reliability model was proposed through quantifying and evaluating the trustworthiness of computing nodes by referring to the social trust relationship. The constraints on the numbers of recoveries performed and the recoverability probability could be adjusted freely by resource owners. A dynamic level scheduling (DLS) algorithm considering fault recovery mechanism named FR DLS was proposed by integrating the trustworthiness of the nodes into the existing DLS algorithm. The FR DLS algorithm takes the Cloud service resources’ trust degree into account when calculating the scheduling level of task resource pairs. Accordingly, the tasks could be executed on trust nodes efficiently. A simulation platform based on CloudSim in PlanetLab was developed in order to evaluate the proposed algorithm. The theoretical analyses and simulation experimental results prove that the proposed FR DLS algorithm can efficiently improve the mission success rate in cloud environment at the expense of relatively fewer execution time and scheduling length. With the increasing number of nodes and tasks, the increased performance in reliability is much higher than that in the cost of execution time and scheduling length, verifying the practicability in large scale Cloud environment.



Published: 31 December 2015
CLC:  TP 393  
Cite this article:

QI Ping, LI Long shu, LI Xue jun. Cloud resource scheduling algorithm with failure recovery mechanism. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(12): 2305-2315.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2015.12.010     OR     http://www.zjujournals.com/eng/Y2015/V49/I12/2305


具有失效恢复机制的云资源调度算法

针对云服务可靠性较低的问题,提出一种考虑节点失效恢复机制的任务调度模型.该模型引入失效恢复机制分析节点的行为特性,将节点间的交互失效划分为可恢复失效和不可恢复失效.参考社会学的人际关系信任模型,通过量化和评估失效恢复机制下节点的可信程度,建立更加符合实际的云服务可靠性模型,并允许资源节点自行调节失效恢复次数限制和失效恢复率.将节点的可信度并入DLS算法得到考虑失效恢复机制的动态级调度(FR DLS)算法.FR DLS算法在计算调度级别时充分考虑服务资源的可信程度,使应用任务能够被有效地分配到可信资源节点上.为了评估所提出的算法,在PlanetLab环境中设计基于CloudSim的仿真实验平台,分析及仿真实验结果表明:所提出的FR DLS算法在牺牲较少的任务完成时间和调度长度的前提下,能够有效地提高云环境下执行任务的成功率;当云环境中的资源节点数和应用任务数不断增加时,该算法在可靠性方面所提升的性能远高于其在任务完成时间和调度长度代价方面所提升的性能,充分体现了其在大规模云环境下的实用性.

[1] BUYYA R, YEO C S. VENUGOPAL S, et al. Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility [J]. Future Generation Computer Systems, 2009,25(6): 599-616.
[2] DARBHA S, AGRAWAL D P. Optimal scheduling algorithm for distributed memory machines [J]. IEEE Transactions on Parallel and Distributed Systems, 2002,9(1): 87-95.
[3] LEE Y C, ZOMAYA A Y. A novel state transition method for metaheuristic based scheduling in heterogeneous computing systems [J]. IEEE Transactions on Parallel and Distributed Systems, 2008, 19 (9):1215-1223.
[4] ZHU D, MOSSE D, MELHEM R. Power aware scheduling for and/or graphs in real time systems [J]. IEEE Transactions on Parallel and Distributed Systems, 2004, 15(9) : 849-864.
[5] KIM K H, BUYYA R, KIM J. Power aware scheduling of bag of tasks applications with deadline constraints on DVS enabled clusters [C] ∥ Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid. Rio de janeiro: IEEE, 2007: 541-548.
[6] BUNDE D P. Power aware scheduling for makespan and flow [J]. Journal of Scheduling, 2009, 12 (5):489-500.
[7] 李茂胜,杨寿保,付前飞,等.基于赔偿的网格资源交易模型[J].软件学报,2006,17(3):472-480.
LI Mao sheng, YANG Shou bao, FU Qian fei, et al. A grid resource transaction model based on compensation [J]. Journal of Software, 2006, 17(3): 472-480.
[8] XU B M, ZHAO C Y, HU E Z, et al. Job scheduling algorithm based on Berger model in cloud environment [J]. Advances in Engineering Software, 2011, 42(3): 419-425.
[9] BUYYA R, MURSHED M M, ABRAMSON D, et al. Scheduling parameter sweep applications on global grids: a deadline and budget constrained cost time optimization algorithm [J]. Software Practice and Experience, 2005, 35(5): 491-512.
[10] BLANCO C V, HUEDO E, MONTERO R S, et al. Dynamic provision of computing resources from grid infrastructures and cloud providers [C] ∥ Grid and Pervasive Computing Conference, Geneva: IEEE, 2009: 113-120.
[11] TOPCUOGLU H, HARIRI S, WU M Y. Performance effective and low complexity task scheduling for heterogeneous computing [J]. IEEE Transactions on Parallel and Distributed Systems, 2002,13(3):260-274.
[12] MEZMAZ M, MELAB N, KESSACI Y, et al. A parallel bi objective hybrid metaheuristic for energy aware scheduling for cloud computing systems [J]. Journal of Parallel and Distributed Computing, 2011, 71(10): 1497-1508.
[13] DOGAN A, OZGUNER, F. Reliable matching and scheduling of precedence constrained tasks in heterogeneous distributed computing [C] ∥ Proceedings of the 29th international conference on parallel processing, Toronto: IEEE, 2000: 307-314.
[14] DOGAN A, OZGUNER, F. Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing [J]. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): 308-323.
[15] DAI Y S, XIE M. Reliability of grid service systems [J]. Computers and Industrial Engineering, 2006,50(1/2): 130-147.
[16] LEVITIN G, DAI Y S. Service reliability and performance in grid system with star topology [J]. Reliability Engineering and System Safety,2007, 92(1): 40-46.
[17] FOSTER I, ZHAO Y, RAICU I, et al. Cloud Computing and Grid Computing 360 Degree Compared [M].Texa: IEEE Grid Computing Environments, 2008:1014-1017.
[18] BLAZE M. Toward a broader view of security protocols [C]∥ 12th Cambridge International Workshop on Security Protocols, Cambridge: IEEE, 2004: 1014-1017.
[19] JOSANG A, OZGUNER F. Matching and scheduling of minimizing execution time and failure probability of applications in heterogeneous computing [J]. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): 308-323.
[20] JOSANG A. A logic for uncertain probabilities [J]. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 2001, 9(3): 279-311.
[21] WANG W, ZENG G S, YUAN L L. A reputation multi agent system in semantic web [C] ∥ Proceedingsof the 9th Pacific Rim International Workshop on Multi Agents. Guilin: PRIMA, 2006: 211-219.
[22] WANG W, ZENG G S, TANG D Z, et al. Cloud DLS: dynamic trusted scheduling for cloud computing [J]. Expert Systems with Applications, 2012, 39(5): 2321-2329.
[23] DAI Y S,WANG X L. Optimal resource allocation on grid systems for maximizing service reliability using a genetic algorithm [J]. Reliability Engineering and System Safety, 2006, 91(9): 1071-1082.
[24] TREASTER M. A survey of fault tolerance and fault recovery techniques in parallel systems [R]. ACM Computing Research Repository (CoRR), 2005: 1-11.
[25] ABAWAJY J H. Fault tolerant scheduling policy for grid computing systems [C] ∥ Proceedings of the 19th IEEE International conference on Parallel and Distributed Processing Symposium, New York: IEEE, 2004: 50-58.
[26] JOSANG A, ISMAIL R. The beta reputation system [C] ∥ Proceedings of the 15th Bled Conference on Electronic Commerce, Slovenia: Bled EC, 2002:2502-2511.
[27] GUO S C, HUANG H Z,LIU Y. Modeling and analysis of grid service reliability considering fault recovery [J]. New Generation Computing, 2011,29(4):345-364.
[28] THOMAS L, JOHN S J. Bayesian methods: an analysis of statisticians and interdisciplinary [M]. New York: Cambridge University Press, 1999: 341-355.
[29] SULISTIO A, CIBEJ U, VENUGOPAL S, et al. A tutorial for modeling and simulating data grids: an extension to Gridsim [J].Concurrency and Computation: Practice and Experience, 2008, 20(13): 1591-1609.
[30] ZHU M, GUO W, XIAO S L, et al. Availability driven scheduling for real time directed acyclic graph applications in optical grid [J]. Journal of Optical Communications and Networking, 2010, 2(7): 469-480.
[31] PETERSON L, BAVIER A, FIUCZYNSK M, et al. Towards a comprehensive PlanetLab architecture technical report PDN 05 030 [R]. Sydney: PlanetLab Consortium, 2005.
[32] CALHEIROS R N, RANJAN R, ROSE D, et al. CloudSim: a novel framework for modeling and simulation of cloud computing infrastructures and services [R]. Melbourne: Grid Computing and Distributed Systems Laboratory, the University of Melbourne,2009.

[1] LI Jian-li, DING Ding, LI Tao. Multi-objective hybrid cloud task scheduling using twice clustering[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1233-1241.
[2] YOU Lu-jin, LU Xing-jian, HE Gao-qi. Research on sub-health in cloud environment[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1181-1189.
[3] ZHANG Xin-xin, XU Ke, ZHONG Yi-Feng, SU Hui. Evolutionary game analysis on cooperative behaviors of  internet service providers[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1214-1224.
[4] WANG Yu-xiang, LI Sheng-jie, WANG Hao, MA Jun-yi, WANG Ya-sha, ZHANG Da-qing. Survey on Wi-Fi based contactless activity recognition[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 648-654.
[5] QIAN Liang-fang, ZHANG Sen-lin, LIU Mei-qin. Reservation-based MAC protocol for underwater wireless sensor networks with data train[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 691-696.
[6] LI Xiao-dong, ZHU Yue-fei, LIU Sheng-li, XIAO Rui-qing. Permission-based Android application security evaluation method[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 590-597.
[7] HUANG Yan, WANG Peng, XIE Gao hui, AN Jun xiu. Data center energy cost optimization in smart grid: a review[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(12): 2386-2399.
[8] YU Yang,XIA Chun he,YUAN Zhi chao,LI Zhong. Trust bootstrapping model for computer network collaborative defense system[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(9): 1684-1694.
[9] SU Kai, MA Liang-li, SUN Yu-fei, GUO Xiao-ming. Non-negative matrix factorization model for Web service QoS prediction[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(7): 1358-1366.
[10] GAO Jian-xin, WU Xu-sheng, GAO Wei, ZHANG Wen-bing. Self-archiving model of trust data for mobile ad hoc network[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(6): 1022-1030.
[11] REN Wu-ling, ZHAO Cui-wen, JIANG Guo-xin,David Maimon, Theodore Wilson, Bertrand Sobesto. Network defense strategy based on cyber attack behavior prediction[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(12): 2144-2151.
[12] GAO Meng-zhou, FENG Dong-qin, LING Cong-li, CHU Jian. Vulnerability analysis of industrial control system based on attack graph[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(12): 2123-2131.
[13] I De-jun,WANG Gang,YANG Can-jun,JIN Bo,CHEN Yan-hu. NTP/IEEE1588-based time synchronization system in seafloor observatory network[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(1): 1-7.
[14] GUO Tong,LIN Feng. Bayesian network structure learning based on hybrid genetic
and fish swarm algorithm
[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(1): 130-135.
[15] LIU Duan-yang , Xie Jian-ping, CAO Yan-long. Research on divisible load scheduling algorithm based on energy model[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(9): 1547-1553.