航线交换机制下多船避碰的策略学习与博弈决策
汪洋,刘红超,田池,吴兵,张笛

Multi-ship collision avoidance via route exchange mechanism: strategy learning and game-theoretic decision making
Yang WANG,Hongchao LIU,Chi TIAN,Bing WU,Di ZHANG
表 1 多智能体深度确定性策略梯度算法参数
Tab.1 Parameter of multi-agent deep deterministic policy gradient algorithm
参数数值参数数值
训练最大回合数104经验池大小$ D $106
最大时间步长Step500网络学习率$ \alpha $0.0005
采样样本数$ K $256奖励折扣系数$ \gamma $0.98
软更新系数$ \tau $0.0005