考虑劣化维护的单机调度深度强化学习模型和算法
|
|
陈勇,杜习之,姜一炜,易文超,裴植,纪祖臻
|
Deep reinforcement learning models and algorithms for single-machine scheduling considering deteriorated maintenance
|
|
Yong CHEN,Xizhi DU,Yiwei JIANG,Wenchao YI,Zhi PEI,Zuzhen JI
|
|
| 表 6 DRL算法训练性能比较 |
| Tab.6 Training performance comparison of DRL algorithms |
|
| 规模 | 算法 | 时间步/106 | 平均FPS | 训练时长/h | | 10 | A2C | 1 | 1498.4 | 0.18 | | DQN | 1 | 1660.8 | 0.27 | | PPO | 1 | 1141.6 | 0.24 | | 20 | A2C | 2 | 1473.7 | 0.37 | | DQN | 2 | 1440.6 | 0.54 | | PPO | 2 | 1109.6 | 0.50 | | 30 | A2C | 2 | 1447.3 | 0.38 | | DQN | 2 | 1393.9 | 0.54 | | PPO | 2 | 1093.3 | 0.51 | | 50 | A2C | 4 | 1400.9 | 0.79 | | DQN | 4 | 1283.4 | 1.11 | | PPO | 4 | 1065.4 | 1.05 | | 80 | A2C | 8 | 1360.6 | 1.63 | | DQN | 8 | 1154.3 | 2.27 | | PPO | 8 | 1038.6 | 2.15 | | 100 | A2C | 8 | 1350.4 | 1.66 | | DQN | 8 | 1121.4 | 2.31 | | PPO | 8 | 1018.6 | 2.19 | | 150 | A2C | 16 | 1276.1 | 3.51 | | DQN | 16 | 1051.9 | 4.68 | | PPO | 16 | 1007.2 | 4.33 |
|
|
|