Deep reinforcement learning models and algorithms for single-machine scheduling considering deteriorated maintenance

doi:10.3785/j.issn.1008-973X.2026.07.015

Journal of ZheJiang University (Engineering Science)

2026, Vol. 60

Issue (7): 1528-1538 DOI: 10.3785/j.issn.1008-973X.2026.07.015

Deep reinforcement learning models and algorithms for single-machine scheduling considering deteriorated maintenance

Yong CHEN(

),Xizhi DU,Yiwei JIANG,Wenchao YI*(

),Zhi PEI,Zuzhen JI

College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China

Download:

HTML

PDF(1712KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A multi-stage machine state model was proposed to address the single-machine scheduling problem under machine degradation and maintenance strategies, with the objective of minimizing total production cost. A state transition mechanism was designed to incorporate both degradation evolution and maintenance effects. Job tardiness cost, machine operating cost, and maintenance cost were jointly considered to improve economic efficiency in production of the entire production process. An integrated decision-making framework for scheduling and maintenance based on deep reinforcement learning was developed, in which the Agent was trained through interaction with the environment to learn optimized scheduling and maintenance strategies. Joint decisions on job sequencing and maintenance timing were realized in complex dynamic systems. Benchmark instances of various scales were designed, and the effectiveness of the proposed model and framework was validated through computational experiments. The results indicate that the proposed approach achieves better performance in minimizing total scheduling and maintenance costs compared with several integrated optimization strategies. The conflict between production scheduling and machine maintenance is effectively balanced, and a more advantageous integrated optimization strategy for scheduling and maintenance is realized in dynamic and uncertain environments.

Key words： single-machine scheduling equipment maintenance deep reinforcement learning deterioration effect integration optimization

Received: 19 February 2025 Published: 23 May 2026

CLC:

TP 181

Fund: 国家自然科学基金重点资助项目（W2411062）；浙江省自然科学基金资助项目（LGG22G010002）；国家自然科学基金资助项目（52005447, 71871203）.

Corresponding Authors: Wenchao YI E-mail: cy@zjut.edu.cn;yiwenchao@zjut.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Yong CHEN
	Xizhi DU
	Yiwei JIANG
	Wenchao YI
	Zhi PEI
	Zuzhen JI

Cite this article:

Yong CHEN,Xizhi DU,Yiwei JIANG,Wenchao YI,Zhi PEI,Zuzhen JI. Deep reinforcement learning models and algorithms for single-machine scheduling considering deteriorated maintenance. Journal of ZheJiang University (Engineering Science), 2026, 60(7): 1528-1538.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.07.015 OR https://www.zjujournals.com/eng/Y2026/V60/I7/1528

考虑劣化维护的单机调度深度强化学习模型和算法

针对单台机器在考虑劣化效应与维护策略下的调度问题，提出多阶段机器状态模型. 以最小化生产总成本为目标，设计结合劣化演化和维护效果的状态转移机制，综合考虑作业延迟成本、机器运行成本和维护成本，旨在使整个生产过程更加经济和高效. 基于深度强化学习方法构建调度与维护一体化决策模型框架，通过训练Agent在与环境交互中学习优化策略，实现对复杂动态系统中作业调度与维护时机的联合决策. 设计多种规模的算例并验证框架和模型对结果优化的有效性. 实验对比结果表明，所提出的模型框架及算法在作业调度和维护总成本控制方面相较于多种综合优化策略方法具有更优表现，能够有效协调作业调度与设备维护的冲突关系，在动态不确定环境下实现更具优势的调度和维护一体化的优化策略学习和应用.

关键词： 单机调度, 设备维护, 深度强化学习, 劣化效应, 集成优化

Fig.1 Deterioration effect process

Fig.2 Transition of state

Fig.3 Scheduling decision model

Tab.1 Description of action space

Fig.4 Machine status and scheduling strategy reward curve

Fig.5 Single strategy running step and reward curve

Fig.6 Network structure of DRL algorithm

Fig.7 Algorithm flow of DRL

Tab.2 Mean and standard deviation of cost with R-M integrated optimization strategies

Tab.3 Minimum of cost with R-M integrated strategies

Tab.4 Environment parameter setting

Tab.5 Hyperparameter settings of DRL algorithms

Fig.8 Influence curves of scale on average reward of episode

Fig.9 Influence curves of scale on average step of episode

Fig.10 Calculation speed curves with scale of 80

Tab.6 Training performance comparison of DRL algorithms

Tab.7 Optimized cost mean and standard deviation of DRL algorithms

Fig.11 Comparison of optimization cost results of different algorithms

Tab.8 Optimized minimum cost of DRLs

Tab.9 Cost optimization effect of DRL methods

Tab.10 Sorting results of Friedman test by different methods

Fig.12 Scheduling result and records with scale of 50


[1]	JIA J, LU C, YIN L Energy saving in single-machine scheduling management: an improved multi-objective model based on discrete artificial bee colony algorithm[J]. Symmetry, 2022, 14 (3): 561 doi: 10.3390/sym14030561

[2]	ZHANG G, HU Y, SUN J, et al An improved genetic algorithm for the flexible job shop scheduling problem with multiple time constraints[J]. Swarm and Evolutionary Computation, 2020, 54: 100664 doi: 10.1016/j.swevo.2020.100664

[3]	HAJEJ Z, REZG N, ASKRI T Joint optimization of capacity, production and maintenance planning of leased machines[J]. Journal of Intelligent Manufacturing, 2020, 31 (2): 351- 374 doi: 10.1007/s10845-018-1450-7

[4]	DURAN TOKSARı M A branch and bound algorithm to minimize the single machine maximum tardiness problem under effects of learning and deterioration with setup times[J]. RAIRO - Operations Research, 2016, 50 (1): 211- 219 doi: 10.1051/ro/2015026

[5]	ZHANG X, XIA T, PAN E, et al Integrated optimization on production scheduling and imperfect preventive maintenance considering multi-degradation and learning-forgetting effects[J]. Flexible Services and Manufacturing Journal, 2022, 34 (2): 451- 482 doi: 10.1007/s10696-021-09410-1

[6]	SUN X, GENG X N Single-machine scheduling with deteriorating effects and machine maintenance[J]. International Journal of Production Research, 2019, 57 (10): 3186- 3199 doi: 10.1080/00207543.2019.1566675

[7]	GHALEB M, TAGHIPOUR S, SHARIFI M, et al Integrated production and maintenance scheduling for a single degrading machine with deterioration-based failures[J]. Computers and Industrial Engineering, 2020, 143: 106432 doi: 10.1016/j.cie.2020.106432

[8]	PAPROCKA I, KRENCZYK D, BURDUK A The method of production scheduling with uncertainties using the ants colony optimisation[J]. Applied Sciences, 2021, 11 (1): 171

[9]	宋文家, 张超勇, 尹勇, 等基于多目标混合殖民竞争算法的设备维护与车间调度集成优化[J]. 中国机械工程, 2015, 26 (11): 1478- 1487 SONG Wenjia, ZHANG Chaoyong, YIN Yong, et al Integrated optimization of equipment maintenance and shop scheduling problem based on multi-objective hybrid imperialist competitive algorithm[J]. China Mechanical Engineering, 2015, 26 (11): 1478- 1487 doi: 10.3969/j.issn.1004-132X.2015.11.010

[10]	甘婕, 侯青玉, 汪思宇, 等流水车间调度与视情维修的联合决策[J]. 工业工程与管理, 2023, 28 (1): 207- 214 GAN Jie, HOU Qingyu, WANG Siyu, et al The joint decision and optimization of flow-shop scheduling and condition based maintenance[J]. Industrial Engineering and Management, 2023, 28 (1): 207- 214

[11]	甘婕, 曾建潮考虑劣化状态的单机调度与维修决策集成模型[J]. 控制与决策, 2016, 31 (3): 513- 520 GAN Jie, ZENG Jianchao Integrated model of single-machine scheduling and maintenance decision for degrading state systems[J]. Control and Decision, 2016, 31 (3): 513- 520

[12]	张昕莹, 陈璐, 杨雯惠考虑系统时变效应与预防性维护的平行机调度[J]. 浙江大学学报: 工学版, 2022, 56 (2): 408- 418 ZHANG Xinying, CHEN Lu, YANG Wenhui A parallel-machine scheduling problem with time-changing effect and preventive maintenance[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (2): 408- 418

[13]	杨宏兵, 沈露, 成明, 等带退化效应多态生产系统调度与维护集成优化[J]. 计算机集成制造系统, 2018, 24 (1): 80- 88 YANG Hongbing, SHEN Lu, CHENG Ming, et al Integrated optimization of scheduling and maintenance in multi-state production systems with deterioration effects[J]. Computer Integrated Manufacturing Systems, 2018, 24 (1): 80- 88

[14]	YANG H, LI W, WANG B Joint optimization of preventive maintenance and production scheduling for multi-state production systems based on reinforcement learning[J]. Reliability Engineering and System Safety, 2021, 214: 107713 doi: 10.1016/j.ress.2021.107713

[15]	LAMPRECHT R, WURST F, HUBER M F. Reinforcement learning based condition-oriented maintenance scheduling for flow line systems [EB/OL]. [2025-01-01]. https://ieeexplore.ieee.org/document/9557373/.

[16]	SALMASNIA A, SHABANI A Opportunistic maintenance modeling for series production systems based on bottleneck by considering energy consumption and market demand[J]. Journal of Industrial and Production Engineering, 2023, 40 (6): 506- 518 doi: 10.1080/21681015.2023.2234377

[17]	YU M, LI T, MA J. Joint optimization method of production scheduling for prefabricated components based on preventive maintenance [C]// 41st Chinese Control Conference. Hefei: IEEE, 2022: 1940–1944.

[18]	杨梦月, 董文杰, 刘思峰基于2种周期维护类型和序列准备时间的单机调度[J]. 控制与决策, 2024, 39 (10): 3488- 3496 YANG Mengyue, DONG Wenjie, LIU Sifeng Single machine scheduling based on two types of periodic maintenance and sequence-dependent setup times[J]. Control and Decision, 2024, 39 (10): 3488- 3496

[19]	KANG K, SUBRAMANIAM V Integrated control policy of production and preventive maintenance for a deteriorating manufacturing system[J]. Computers and Industrial Engineering, 2018, 118: 266- 277 doi: 10.1016/j.cie.2018.02.026

[20]	XANTHOPOULOS A S, KIATIPIS A, KOULOURIOTIS D E, et al Reinforcement learning-based and parametric production-maintenance control policies for a deteriorating manufacturing system[J]. IEEE Access, 2017, 6: 576- 588

[21]	MNIH V, KAVUKCUOGLU K, SILVER D, et al Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533 doi: 10.1038/nature14236

[22]	LUO S Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning[J]. Applied Soft Computing, 2020, 91: 106208 doi: 10.1016/j.asoc.2020.106208

[23]	LIU R, PIPLANI R, TORO C Deep reinforcement learning for dynamic scheduling of a flexible job shop[J]. International Journal of Production Research, 2022, 60 (13): 4049- 4069 doi: 10.1080/00207543.2022.2058432

[24]	HAN B A, YANG J J Research on adaptive job shop scheduling problems based on dueling double DQN[J]. IEEE Access, 2020, 8: 186474- 186495 doi: 10.1109/ACCESS.2020.3029868

[1]	Yiwei ZHANG,Xin CUI,Qinghui ZHAO,Yan CHEN. Collaborative content caching optimization in UAV-assisted internet of vehicle based on NOMA[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(6): 1289-1298.

[2]	Qingqing YANG,Runpeng TANG,Yi PENG. Joint waveform and phase shift design in integrated sensing and communication systems[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 906-914.

[3]	Jiale LIU,Yali XUE,Shan CUI,Jun HONG. TD3 mapless navigation algorithm guided by dynamic window approach[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1671-1679.

[4]	Kun HAO,Xuan MENG,Xiaofang ZHAO,Zhisheng LI. 3D underwater AUV path planning method integrating adaptive potential field method and deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1451-1461.

[5]	Wei ZHAO,Wanzhi ZHANG,Jialin HOU,Rui HOU,Yuhua LI,Lejun ZHAO,Jin Cheng. Path planning of agricultural robots based on improved deep reinforcement learning algorithm[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1492-1503.

[6]	Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU. Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1923-1934.

[7]	Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO. Traffic signal control method based on asynchronous advantage actor-critic[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1671-1680.

[8]	Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.

[9]	Yu-feng JIANG,Dong-sheng CHEN. Assembly strategy for large-diameter peg-in-hole based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2210-2216.

[10]	Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.

[11]	Zhi-min LIU,Bao-Lin YE,Yao-dong ZHU,Qing YAO,Wei-min WU. Traffic signal control method based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1249-1256.

[12]	Qi-lin DENG,Juan LU,Yong-hui CHEN,Jian FENG,Xiao-ping LIAO,Jun-yan MA. Optimization method of CNC milling parameters based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2145-2155.

[13]	Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 395-401.

Viewed

Full text

Abstract

Cited

Shared

Discussed