基于自适应课程强化学习的多无人艇对抗围捕决策
陈浪,刘增力,赵宣植

Decision-making for multi-USV adversarial encirclement based on adaptive curriculum reinforcement learning
Lang CHEN,Zengli LIU,Xuanzhi ZHAO
表 3 奖励函数参数设置
Tab.3 Setting of reward function parameters
参数数值参数数值
$ {k}_{\text{d1}} $,$ {k}_{\text{d2}} $0.01, 0.15$ {K}_{\text{aph}} $300
$ {k}_{\text{a1}} $,$ {k}_{\text{a2}} $0.5, 0.1$ {K}_{\text{eva}} $0.8
$ {k}_{\text{v}} $0.5$ {w}_{1},{w}_{2},{w}_{3},{w}_{4} $0.35, 0.17, 0.28, 0.20