基于自适应课程强化学习的多无人艇对抗围捕决策
陈浪,刘增力,赵宣植
Decision-making for multi-USV adversarial encirclement based on adaptive curriculum reinforcement learning
Lang CHEN,Zengli LIU,Xuanzhi ZHAO
表 3
奖励函数参数设置
Tab.3
Setting of reward function parameters
参数
数值
参数
数值
$ {k}_{\text{d1}} $
,
$ {k}_{\text{d2}} $
0.01, 0.15
$ {K}_{\text{aph}} $
300
$ {k}_{\text{a1}} $
,
$ {k}_{\text{a2}} $
0.5, 0.1
$ {K}_{\text{eva}} $
0.8
$ {k}_{\text{v}} $
0.5
$ {w}_{1},{w}_{2},{w}_{3},{w}_{4} $
0.35, 0.17, 0.28, 0.20