基于自适应课程强化学习的多无人艇对抗围捕决策
陈浪,刘增力,赵宣植

Decision-making for multi-USV adversarial encirclement based on adaptive curriculum reinforcement learning
Lang CHEN,Zengli LIU,Xuanzhi ZHAO
表 4 MAPPO和PPO算法的超参数设置
Tab.4 Hyperparameter settings of MAPPO and PPO algorithms
参数数值
MAPPOPPO
M2 0481 024
Bs256128
$ {l}_{{\mathrm{r}}} $2×10−410−4
$ \gamma $0.990.99
$ \epsilon $0.20.2
$ {\lambda }_{\text{GAE}} $0.950.95
$ {\alpha }_{\text{H}} $0.010.01
Tmax300300
Eepoch103
τclip0.5
nh2×2562×128