基于改进深度强化学习算法的农业机器人路径规划
赵威,张万枝,侯加林,侯瑞,李玉华,赵乐俊,程进

Path planning of agricultural robots based on improved deep reinforcement learning algorithm
Wei ZHAO,Wanzhi ZHANG,Jialin HOU,Rui HOU,Yuhua LI,Lejun ZHAO,Jin Cheng
表 2 仿真实验中的主要超参数
Tab.2 Main hyperparameters in simulation experiment
参数数值参数数值
中线奖励因子a1.25经验池容量Rn128 000
区域奖励因子b±1.2学习率lr0.001
贪婪权重α0.6训练回合数Ep2 000
优先经验回放权重β0.4每回合时间步数Es1 000
折扣率γ0.9网络更新频率ui10
抽样保证因子τ0.3每批次样本数bs128
衰减步数εd1 000