考虑劣化维护的单机调度深度强化学习模型和算法

陈勇,杜习之,姜一炜,易文超,裴植,纪祖臻

Deep reinforcement learning models and algorithms for single-machine scheduling considering deteriorated maintenance

Yong CHEN,Xizhi DU,Yiwei JIANG,Wenchao YI,Zhi PEI,Zuzhen JI

表 1 动作空间描述

Tab.1 Description of action space

符号	动作	描述	数学形式
a₁	SPT	最短加工时间优先	$ {\mathrm{m}\mathrm{i}\mathrm{n}}_{j\in B}\;{p}_{j} $
a₂	LPT	最长加工时间优先	$ {\mathrm{m}\mathrm{a}\mathrm{x}}_{j\in B}\;{p}_{j} $
a₃	EDD	最早交付期优先	$ {\mathrm{m}\mathrm{i}\mathrm{n}}_{j\in B}\;{d}_{j} $
a₄	FCFS	最早到达时间优先	$ {\mathrm{m}\mathrm{i}\mathrm{n}}_{j\in B}\;{\mathrm{a}\mathrm{r}\mathrm{r}\mathrm{i}\mathrm{v}\mathrm{e}}_{j} $
a₅	MST	最小松弛时间	$ {\mathrm{m}\mathrm{i}\mathrm{n}}_{j\in B}\;\left({d}_{j}-\left(t+{p}_{j}\right)\right) $
a₆	CR	最小临界比率	$ {\mathrm{min}}_{j\in B}\;\left({d}_{j}-t\right)/{p}_{j} $
a₇	MDD	修正交付时间优先	$ {\mathrm{m}\mathrm{i}\mathrm{n}}_{j\in B}\;\mathrm{m}\mathrm{a}\mathrm{x}\;({d}_{j},t+{p}_{j}) $
a₈	PM	执行不完全维护	$ {M}_{i-1}+{R}_{\mathrm{P}\mathrm{M}} $
a₉	CM	执行完全维护	$ {M}_{i-1}+{R}_{\mathrm{C}\mathrm{M}} $