Please wait a minute...
浙江大学学报(工学版)
计算机技术     
具有失效恢复机制的云资源调度算法
齐平1,2, 李龙澍1, 李学俊1
1.安徽大学 计算机科学与技术学院, 安徽 合肥 230039; 2.铜陵学院 数学与计算机科学系, 安徽 铜陵 244000
Cloud resource scheduling algorithm with failure recovery mechanism
QI Ping1,2, LI Long shu1, LI Xue jun1
1. Department of Computer Science and Technology, Anhui University, Hefei 230039, China;2. Department of Mathematics and Computer Science, Tongling University, Tongling 244000, China
 全文: PDF(1883 KB)   HTML
摘要:

针对云服务可靠性较低的问题,提出一种考虑节点失效恢复机制的任务调度模型.该模型引入失效恢复机制分析节点的行为特性,将节点间的交互失效划分为可恢复失效和不可恢复失效.参考社会学的人际关系信任模型,通过量化和评估失效恢复机制下节点的可信程度,建立更加符合实际的云服务可靠性模型,并允许资源节点自行调节失效恢复次数限制和失效恢复率.将节点的可信度并入DLS算法得到考虑失效恢复机制的动态级调度(FR DLS)算法.FR DLS算法在计算调度级别时充分考虑服务资源的可信程度,使应用任务能够被有效地分配到可信资源节点上.为了评估所提出的算法,在PlanetLab环境中设计基于CloudSim的仿真实验平台,分析及仿真实验结果表明:所提出的FR DLS算法在牺牲较少的任务完成时间和调度长度的前提下,能够有效地提高云环境下执行任务的成功率;当云环境中的资源节点数和应用任务数不断增加时,该算法在可靠性方面所提升的性能远高于其在任务完成时间和调度长度代价方面所提升的性能,充分体现了其在大规模云环境下的实用性.

Abstract:

A task scheduling model with fault recovery mechanism was proposed aimed at the problem of low reliability in Cloud service. The behavior characteristic of nodes was analyzed by using failure recovery mechanism, and the interaction failures between nodes were classified into two categories, including unrecoverable failures and recoverable failures. A more practical Cloud service reliability model was proposed through quantifying and evaluating the trustworthiness of computing nodes by referring to the social trust relationship. The constraints on the numbers of recoveries performed and the recoverability probability could be adjusted freely by resource owners. A dynamic level scheduling (DLS) algorithm considering fault recovery mechanism named FR DLS was proposed by integrating the trustworthiness of the nodes into the existing DLS algorithm. The FR DLS algorithm takes the Cloud service resources’ trust degree into account when calculating the scheduling level of task resource pairs. Accordingly, the tasks could be executed on trust nodes efficiently. A simulation platform based on CloudSim in PlanetLab was developed in order to evaluate the proposed algorithm. The theoretical analyses and simulation experimental results prove that the proposed FR DLS algorithm can efficiently improve the mission success rate in cloud environment at the expense of relatively fewer execution time and scheduling length. With the increasing number of nodes and tasks, the increased performance in reliability is much higher than that in the cost of execution time and scheduling length, verifying the practicability in large scale Cloud environment.

出版日期: 2015-12-31
:  TP 393  
基金资助:

国家自然科学青年基金项目(61402005).

通讯作者: 李龙澍,男,教授.ORCID:0000 0003 0492 4289.     E-mail: lilongshu@163.com
作者简介: 齐平(1981—),男,博士,讲师,从事分布式计算研究.ORCID:0000 0001 6049 9729. E-mail: qiping929@gmail.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

齐平, 李龙澍, 李学俊. 具有失效恢复机制的云资源调度算法[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2015.12.010.

QI Ping, LI Long shu, LI Xue jun. Cloud resource scheduling algorithm with failure recovery mechanism. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2015.12.010.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2015.12.010        http://www.zjujournals.com/eng/CN/Y2015/V49/I12/2305

[1] BUYYA R, YEO C S. VENUGOPAL S, et al. Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility [J]. Future Generation Computer Systems, 2009,25(6): 599-616.
[2] DARBHA S, AGRAWAL D P. Optimal scheduling algorithm for distributed memory machines [J]. IEEE Transactions on Parallel and Distributed Systems, 2002,9(1): 87-95.
[3] LEE Y C, ZOMAYA A Y. A novel state transition method for metaheuristic based scheduling in heterogeneous computing systems [J]. IEEE Transactions on Parallel and Distributed Systems, 2008, 19 (9):1215-1223.
[4] ZHU D, MOSSE D, MELHEM R. Power aware scheduling for and/or graphs in real time systems [J]. IEEE Transactions on Parallel and Distributed Systems, 2004, 15(9) : 849-864.
[5] KIM K H, BUYYA R, KIM J. Power aware scheduling of bag of tasks applications with deadline constraints on DVS enabled clusters [C] ∥ Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid. Rio de janeiro: IEEE, 2007: 541-548.
[6] BUNDE D P. Power aware scheduling for makespan and flow [J]. Journal of Scheduling, 2009, 12 (5):489-500.
[7] 李茂胜,杨寿保,付前飞,等.基于赔偿的网格资源交易模型[J].软件学报,2006,17(3):472-480.
LI Mao sheng, YANG Shou bao, FU Qian fei, et al. A grid resource transaction model based on compensation [J]. Journal of Software, 2006, 17(3): 472-480.
[8] XU B M, ZHAO C Y, HU E Z, et al. Job scheduling algorithm based on Berger model in cloud environment [J]. Advances in Engineering Software, 2011, 42(3): 419-425.
[9] BUYYA R, MURSHED M M, ABRAMSON D, et al. Scheduling parameter sweep applications on global grids: a deadline and budget constrained cost time optimization algorithm [J]. Software Practice and Experience, 2005, 35(5): 491-512.
[10] BLANCO C V, HUEDO E, MONTERO R S, et al. Dynamic provision of computing resources from grid infrastructures and cloud providers [C] ∥ Grid and Pervasive Computing Conference, Geneva: IEEE, 2009: 113-120.
[11] TOPCUOGLU H, HARIRI S, WU M Y. Performance effective and low complexity task scheduling for heterogeneous computing [J]. IEEE Transactions on Parallel and Distributed Systems, 2002,13(3):260-274.
[12] MEZMAZ M, MELAB N, KESSACI Y, et al. A parallel bi objective hybrid metaheuristic for energy aware scheduling for cloud computing systems [J]. Journal of Parallel and Distributed Computing, 2011, 71(10): 1497-1508.
[13] DOGAN A, OZGUNER, F. Reliable matching and scheduling of precedence constrained tasks in heterogeneous distributed computing [C] ∥ Proceedings of the 29th international conference on parallel processing, Toronto: IEEE, 2000: 307-314.
[14] DOGAN A, OZGUNER, F. Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing [J]. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): 308-323.
[15] DAI Y S, XIE M. Reliability of grid service systems [J]. Computers and Industrial Engineering, 2006,50(1/2): 130-147.
[16] LEVITIN G, DAI Y S. Service reliability and performance in grid system with star topology [J]. Reliability Engineering and System Safety,2007, 92(1): 40-46.
[17] FOSTER I, ZHAO Y, RAICU I, et al. Cloud Computing and Grid Computing 360 Degree Compared [M].Texa: IEEE Grid Computing Environments, 2008:1014-1017.
[18] BLAZE M. Toward a broader view of security protocols [C]∥ 12th Cambridge International Workshop on Security Protocols, Cambridge: IEEE, 2004: 1014-1017.
[19] JOSANG A, OZGUNER F. Matching and scheduling of minimizing execution time and failure probability of applications in heterogeneous computing [J]. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): 308-323.
[20] JOSANG A. A logic for uncertain probabilities [J]. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 2001, 9(3): 279-311.
[21] WANG W, ZENG G S, YUAN L L. A reputation multi agent system in semantic web [C] ∥ Proceedingsof the 9th Pacific Rim International Workshop on Multi Agents. Guilin: PRIMA, 2006: 211-219.
[22] WANG W, ZENG G S, TANG D Z, et al. Cloud DLS: dynamic trusted scheduling for cloud computing [J]. Expert Systems with Applications, 2012, 39(5): 2321-2329.
[23] DAI Y S,WANG X L. Optimal resource allocation on grid systems for maximizing service reliability using a genetic algorithm [J]. Reliability Engineering and System Safety, 2006, 91(9): 1071-1082.
[24] TREASTER M. A survey of fault tolerance and fault recovery techniques in parallel systems [R]. ACM Computing Research Repository (CoRR), 2005: 1-11.
[25] ABAWAJY J H. Fault tolerant scheduling policy for grid computing systems [C] ∥ Proceedings of the 19th IEEE International conference on Parallel and Distributed Processing Symposium, New York: IEEE, 2004: 50-58.
[26] JOSANG A, ISMAIL R. The beta reputation system [C] ∥ Proceedings of the 15th Bled Conference on Electronic Commerce, Slovenia: Bled EC, 2002:2502-2511.
[27] GUO S C, HUANG H Z,LIU Y. Modeling and analysis of grid service reliability considering fault recovery [J]. New Generation Computing, 2011,29(4):345-364.
[28] THOMAS L, JOHN S J. Bayesian methods: an analysis of statisticians and interdisciplinary [M]. New York: Cambridge University Press, 1999: 341-355.
[29] SULISTIO A, CIBEJ U, VENUGOPAL S, et al. A tutorial for modeling and simulating data grids: an extension to Gridsim [J].Concurrency and Computation: Practice and Experience, 2008, 20(13): 1591-1609.
[30] ZHU M, GUO W, XIAO S L, et al. Availability driven scheduling for real time directed acyclic graph applications in optical grid [J]. Journal of Optical Communications and Networking, 2010, 2(7): 469-480.
[31] PETERSON L, BAVIER A, FIUCZYNSK M, et al. Towards a comprehensive PlanetLab architecture technical report PDN 05 030 [R]. Sydney: PlanetLab Consortium, 2005.
[32] CALHEIROS R N, RANJAN R, ROSE D, et al. CloudSim: a novel framework for modeling and simulation of cloud computing infrastructures and services [R]. Melbourne: Grid Computing and Distributed Systems Laboratory, the University of Melbourne,2009.

[1] 李建丽, 丁丁, 李涛. 基于二次聚类的多目标混合云任务调度算法[J]. 浙江大学学报(工学版), 2017, 51(6): 1233-1241.
[2] 游录金, 卢兴见, 何高奇. 云环境亚健康研究[J]. 浙江大学学报(工学版), 2017, 51(6): 1181-1189.
[3] 张欣欣, 徐恪, 钟宜峰, 苏辉. 网络服务提供商合作行为的演化博弈分析[J]. 浙江大学学报(工学版), 2017, 51(6): 1214-1224.
[4] 王钰翔, 李晟洁, 王皓, 马钧轶, 王亚沙, 张大庆. 基于Wi-Fi的非接触式行为识别研究综述[J]. 浙江大学学报(工学版), 2017, 51(4): 648-654.
[5] 钱良芳, 张森林, 刘妹琴. 基于预约的数据队列水下无线传感器网络MAC协议[J]. 浙江大学学报(工学版), 2017, 51(4): 691-696.
[6] 李晓东, 祝跃飞, 刘胜利, 肖睿卿. 基于权限的Android应用程序安全审计方法[J]. 浙江大学学报(工学版), 2017, 51(3): 590-597.
[7] 黄焱, 王鹏, 谢高辉, 安俊秀. 智能电网下数据中心能耗费用优化综述[J]. 浙江大学学报(工学版), 2016, 50(12): 2386-2399.
[8] 余洋,夏春和,原志超,李忠. 计算机网络协同防御系统信任启动模型[J]. 浙江大学学报(工学版), 2016, 50(9): 1684-1694.
[9] 苏凯, 马良荔, 孙煜飞, 郭晓明. 面向Web服务QoS预测的非负矩阵分解模型[J]. 浙江大学学报(工学版), 2015, 49(7): 1358-1366.
[10] 高键鑫, 吴旭升, 高嵬, 张文兵. 面向移动自组网的信任数据自存储模型[J]. 浙江大学学报(工学版), 2015, 49(6): 1022-1030.
[11] 高梦州, 冯冬芹, 凌从礼, 褚健. 基于攻击图的工业控制系统脆弱性分析[J]. 浙江大学学报(工学版), 2014, 48(12): 2123-2131.
[12] 任午令, 赵翠文, 姜国新, David Maimon, Theodore Wilson, Bertrand Sobesto. 基于攻击行为预测的网络防御策略[J]. 浙江大学学报(工学版), 2014, 48(12): 2144-2151.
[13] 李德骏,汪港,杨灿军,金波,陈燕虎. 基于NTP和IEEE1588海底观测网时间同步系统[J]. J4, 2014, 48(1): 1-7.
[14] 郭童,林峰. 基于混合遗传鱼群算法的贝叶斯网络结构学习[J]. J4, 2014, 48(1): 130-135.
[15] 刘端阳 ,谢建平,曹衍龙.  基于能量模型的可分负荷调度算法的研究[J]. J4, 2013, 47(9): 1547-1553.