考虑劣化维护的单机调度深度强化学习模型和算法

doi:10.3785/j.issn.1008-973X.2026.07.015

浙江大学学报(工学版)

2026, Vol. 60

Issue (7): 1528-1538 DOI: 10.3785/j.issn.1008-973X.2026.07.015

机械工程

考虑劣化维护的单机调度深度强化学习模型和算法

陈勇(

),杜习之,姜一炜,易文超*(

),裴植,纪祖臻

浙江工业大学机械工程学院，浙江杭州 310023

Deep reinforcement learning models and algorithms for single-machine scheduling considering deteriorated maintenance

Yong CHEN(

),Xizhi DU,Yiwei JIANG,Wenchao YI*(

),Zhi PEI,Zuzhen JI

College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China

全文: PDF(1712 KB) HTML

摘要：

针对单台机器在考虑劣化效应与维护策略下的调度问题，提出多阶段机器状态模型. 以最小化生产总成本为目标，设计结合劣化演化和维护效果的状态转移机制，综合考虑作业延迟成本、机器运行成本和维护成本，旨在使整个生产过程更加经济和高效. 基于深度强化学习方法构建调度与维护一体化决策模型框架，通过训练Agent在与环境交互中学习优化策略，实现对复杂动态系统中作业调度与维护时机的联合决策. 设计多种规模的算例并验证框架和模型对结果优化的有效性. 实验对比结果表明，所提出的模型框架及算法在作业调度和维护总成本控制方面相较于多种综合优化策略方法具有更优表现，能够有效协调作业调度与设备维护的冲突关系，在动态不确定环境下实现更具优势的调度和维护一体化的优化策略学习和应用.

关键词： 单机调度; 设备维护; 深度强化学习; 劣化效应; 集成优化

Abstract:

A multi-stage machine state model was proposed to address the single-machine scheduling problem under machine degradation and maintenance strategies, with the objective of minimizing total production cost. A state transition mechanism was designed to incorporate both degradation evolution and maintenance effects. Job tardiness cost, machine operating cost, and maintenance cost were jointly considered to improve economic efficiency in production of the entire production process. An integrated decision-making framework for scheduling and maintenance based on deep reinforcement learning was developed, in which the Agent was trained through interaction with the environment to learn optimized scheduling and maintenance strategies. Joint decisions on job sequencing and maintenance timing were realized in complex dynamic systems. Benchmark instances of various scales were designed, and the effectiveness of the proposed model and framework was validated through computational experiments. The results indicate that the proposed approach achieves better performance in minimizing total scheduling and maintenance costs compared with several integrated optimization strategies. The conflict between production scheduling and machine maintenance is effectively balanced, and a more advantageous integrated optimization strategy for scheduling and maintenance is realized in dynamic and uncertain environments.

Key words: single-machine scheduling equipment maintenance deep reinforcement learning deterioration effect integration optimization

收稿日期: 2025-02-19 出版日期: 2026-05-23

CLC:

TP 181

基金资助: 国家自然科学基金重点资助项目（W2411062）；浙江省自然科学基金资助项目（LGG22G010002）；国家自然科学基金资助项目（52005447, 71871203）.

通讯作者: 易文超 E-mail: cy@zjut.edu.cn;yiwenchao@zjut.edu.cn

作者简介: 陈勇（1973—），男，教授，从事复杂系统智能算法与优化研究. orcid.org/0000-0001-7778-2731. E-mail：cy@zjut.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	陈勇
	杜习之
	姜一炜
	易文超
	裴植
	纪祖臻

引用本文:

陈勇,杜习之,姜一炜,易文超,裴植,纪祖臻. 考虑劣化维护的单机调度深度强化学习模型和算法[J]. 浙江大学学报(工学版), 2026, 60(7): 1528-1538.

Yong CHEN,Xizhi DU,Yiwei JIANG,Wenchao YI,Zhi PEI,Zuzhen JI. Deep reinforcement learning models and algorithms for single-machine scheduling considering deteriorated maintenance. Journal of ZheJiang University (Engineering Science), 2026, 60(7): 1528-1538.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.07.015 或 https://www.zjujournals.com/eng/CN/Y2026/V60/I7/1528

图 1 劣化效应过程示意图

图 2 状态的转换

图 3 调度决策模型

表 1 动作空间描述

图 4 机器健康状态-调度策略奖励曲线

图 5 单一策略运行步数-奖励曲线

图 6 DRL算法网络结构

图 7 DRL算法流程

表 2 R-M集成优化策略下的成本均值和标准差

表 3 R-M集成优化策略下的成本最小值

表 4 环境参数设置

表 5 DRL算法超参数设置

图 8 规模对回合平均奖励的影响曲线

图 9 规模对回合平均步长的影响曲线

图 10 计算速度曲线（规模为80）

表 6 DRL算法训练性能比较

表 7 DRL方法的成本优化均值和标准差

图 11 不同算法的成本优化结果对比

表 8 DRL方法优化的最优值

表 9 DRL方法的成本优化效果

表 10 不同方法的Friedman 检验排序结果

图 12 调度结果记录（规模为50）

1	JIA J, LU C, YIN L Energy saving in single-machine scheduling management: an improved multi-objective model based on discrete artificial bee colony algorithm[J]. Symmetry, 2022, 14 (3): 561 doi: 10.3390/sym14030561
2	ZHANG G, HU Y, SUN J, et al An improved genetic algorithm for the flexible job shop scheduling problem with multiple time constraints[J]. Swarm and Evolutionary Computation, 2020, 54: 100664 doi: 10.1016/j.swevo.2020.100664
3	HAJEJ Z, REZG N, ASKRI T Joint optimization of capacity, production and maintenance planning of leased machines[J]. Journal of Intelligent Manufacturing, 2020, 31 (2): 351- 374 doi: 10.1007/s10845-018-1450-7
4	DURAN TOKSARı M A branch and bound algorithm to minimize the single machine maximum tardiness problem under effects of learning and deterioration with setup times[J]. RAIRO - Operations Research, 2016, 50 (1): 211- 219 doi: 10.1051/ro/2015026
5	ZHANG X, XIA T, PAN E, et al Integrated optimization on production scheduling and imperfect preventive maintenance considering multi-degradation and learning-forgetting effects[J]. Flexible Services and Manufacturing Journal, 2022, 34 (2): 451- 482 doi: 10.1007/s10696-021-09410-1
6	SUN X, GENG X N Single-machine scheduling with deteriorating effects and machine maintenance[J]. International Journal of Production Research, 2019, 57 (10): 3186- 3199 doi: 10.1080/00207543.2019.1566675
7	GHALEB M, TAGHIPOUR S, SHARIFI M, et al Integrated production and maintenance scheduling for a single degrading machine with deterioration-based failures[J]. Computers and Industrial Engineering, 2020, 143: 106432 doi: 10.1016/j.cie.2020.106432
8	PAPROCKA I, KRENCZYK D, BURDUK A The method of production scheduling with uncertainties using the ants colony optimisation[J]. Applied Sciences, 2021, 11 (1): 171
9	宋文家, 张超勇, 尹勇, 等基于多目标混合殖民竞争算法的设备维护与车间调度集成优化[J]. 中国机械工程, 2015, 26 (11): 1478- 1487 SONG Wenjia, ZHANG Chaoyong, YIN Yong, et al Integrated optimization of equipment maintenance and shop scheduling problem based on multi-objective hybrid imperialist competitive algorithm[J]. China Mechanical Engineering, 2015, 26 (11): 1478- 1487 doi: 10.3969/j.issn.1004-132X.2015.11.010
10	甘婕, 侯青玉, 汪思宇, 等流水车间调度与视情维修的联合决策[J]. 工业工程与管理, 2023, 28 (1): 207- 214 GAN Jie, HOU Qingyu, WANG Siyu, et al The joint decision and optimization of flow-shop scheduling and condition based maintenance[J]. Industrial Engineering and Management, 2023, 28 (1): 207- 214
11	甘婕, 曾建潮考虑劣化状态的单机调度与维修决策集成模型[J]. 控制与决策, 2016, 31 (3): 513- 520 GAN Jie, ZENG Jianchao Integrated model of single-machine scheduling and maintenance decision for degrading state systems[J]. Control and Decision, 2016, 31 (3): 513- 520
12	张昕莹, 陈璐, 杨雯惠考虑系统时变效应与预防性维护的平行机调度[J]. 浙江大学学报: 工学版, 2022, 56 (2): 408- 418 ZHANG Xinying, CHEN Lu, YANG Wenhui A parallel-machine scheduling problem with time-changing effect and preventive maintenance[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (2): 408- 418
13	杨宏兵, 沈露, 成明, 等带退化效应多态生产系统调度与维护集成优化[J]. 计算机集成制造系统, 2018, 24 (1): 80- 88 YANG Hongbing, SHEN Lu, CHENG Ming, et al Integrated optimization of scheduling and maintenance in multi-state production systems with deterioration effects[J]. Computer Integrated Manufacturing Systems, 2018, 24 (1): 80- 88
14	YANG H, LI W, WANG B Joint optimization of preventive maintenance and production scheduling for multi-state production systems based on reinforcement learning[J]. Reliability Engineering and System Safety, 2021, 214: 107713 doi: 10.1016/j.ress.2021.107713
15	LAMPRECHT R, WURST F, HUBER M F. Reinforcement learning based condition-oriented maintenance scheduling for flow line systems [EB/OL]. [2025-01-01]. https://ieeexplore.ieee.org/document/9557373/.
16	SALMASNIA A, SHABANI A Opportunistic maintenance modeling for series production systems based on bottleneck by considering energy consumption and market demand[J]. Journal of Industrial and Production Engineering, 2023, 40 (6): 506- 518 doi: 10.1080/21681015.2023.2234377
17	YU M, LI T, MA J. Joint optimization method of production scheduling for prefabricated components based on preventive maintenance [C]// 41st Chinese Control Conference. Hefei: IEEE, 2022: 1940–1944.
18	杨梦月, 董文杰, 刘思峰基于2种周期维护类型和序列准备时间的单机调度[J]. 控制与决策, 2024, 39 (10): 3488- 3496 YANG Mengyue, DONG Wenjie, LIU Sifeng Single machine scheduling based on two types of periodic maintenance and sequence-dependent setup times[J]. Control and Decision, 2024, 39 (10): 3488- 3496
19	KANG K, SUBRAMANIAM V Integrated control policy of production and preventive maintenance for a deteriorating manufacturing system[J]. Computers and Industrial Engineering, 2018, 118: 266- 277 doi: 10.1016/j.cie.2018.02.026
20	XANTHOPOULOS A S, KIATIPIS A, KOULOURIOTIS D E, et al Reinforcement learning-based and parametric production-maintenance control policies for a deteriorating manufacturing system[J]. IEEE Access, 2017, 6: 576- 588
21	MNIH V, KAVUKCUOGLU K, SILVER D, et al Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533 doi: 10.1038/nature14236
22	LUO S Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning[J]. Applied Soft Computing, 2020, 91: 106208 doi: 10.1016/j.asoc.2020.106208
23	LIU R, PIPLANI R, TORO C Deep reinforcement learning for dynamic scheduling of a flexible job shop[J]. International Journal of Production Research, 2022, 60 (13): 4049- 4069 doi: 10.1080/00207543.2022.2058432
24	HAN B A, YANG J J Research on adaptive job shop scheduling problems based on dueling double DQN[J]. IEEE Access, 2020, 8: 186474- 186495 doi: 10.1109/ACCESS.2020.3029868

[1]	张艺炜,崔鑫,赵庆慧,陈燕. 无人机辅助车联网NOMA协同缓存优化[J]. 浙江大学学报(工学版), 2026, 60(6): 1289-1298.
[2]	杨青青,唐润朋,彭艺. 通信感知一体化系统中的联合波形与相移设计[J]. 浙江大学学报(工学版), 2026, 60(4): 906-914.
[3]	柳佳乐,薛雅丽,崔闪,洪君. 动态窗口法引导的TD3无地图导航算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1671-1679.
[4]	郝琨,孟璇,赵晓芳,李志圣. 融合自适应势场法和深度强化学习的三维水下AUV路径规划方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1451-1461.
[5]	赵威,张万枝,侯加林,侯瑞,李玉华,赵乐俊,程进. 基于改进深度强化学习算法的农业机器人路径规划[J]. 浙江大学学报(工学版), 2025, 59(7): 1492-1503.
[6]	张名芳,马健,赵娜乐,王力,刘颖. 无信号交叉口处基于深度强化学习的智能网联车辆运动规划[J]. 浙江大学学报(工学版), 2024, 58(9): 1923-1934.
[7]	叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.
[8]	张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[9]	姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[10]	华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[11]	刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[12]	邓齐林,鲁娟,陈勇辉,冯健,廖小平,马俊燕. 基于深度强化学习的数控铣削加工参数优化方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2145-2155.
[13]	马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.
[14]	葛晓波, 谢靓, 杨东武, 张树新, 杨癸庚. 大型构架式索网反射面天线机电集成设计[J]. 浙江大学学报(工学版), 2018, 52(4): 775-780.

Viewed

Full text

Abstract

Cited

Shared

Discussed