基于自适应课程强化学习的多无人艇对抗围捕决策
|
|
陈浪,刘增力,赵宣植
|
Decision-making for multi-USV adversarial encirclement based on adaptive curriculum reinforcement learning
|
|
Lang CHEN,Zengli LIU,Xuanzhi ZHAO
|
|
| 表 5 各算法在2种目标策略下的性能指标对比 |
| Tab.5 Comparison of performance metrics of different algorithms under two target strategies |
|
| 目标策略 | 算法 | $ {C}_{\text{u}} $ | $ {C}_{\text{T}} $ | $ {P}_{\text{u}} $ | $ {P}_{\text{T}} $ | $ {D}_{\text{avg}} $/m | $ {R}_{\text{avg}} $ | | RL | ACL-MAPPO | 194 | 40 | 46 | 2 | 302.324 | 55.253 | | CL-MAPPO | 364 | 58 | 71 | 22 | 367.474 | 42.437 | | NOCL-MAPPO | 1 401 | 134 | 432 | 38 | 421.161 | 17.367 | | ACL-MADDPG | 389 | 67 | 83 | 31 | 381.193 | 40.152 | | Random | ACL-MAPPO | 942 | 301 | 154 | 54 | 285.391 | 37.113 | | CL-MAPPO | 1 224 | 304 | 205 | 52 | 321.482 | 30.172 | | NOCL-MAPPO | 1 558 | 150 | 367 | 56 | 343.514 | 13.627 | | ACL-MADDPG | 1 287 | 317 | 221 | 61 | 337.643 | 25.241 |
|
|
|