基于强化学习的机器人曲面恒力跟踪研究
张铁,肖蒙,邹焱飚,肖佳栋

Research on robot constant force control of surface tracking based on reinforcement learning
Tie ZHANG,Meng XIAO,Yan-biao ZOU,Jia-dong XIAO
图 12 第1次和第8次迭代时PILCO预测的奖励函数值和实际的奖励函数值
Fig.12 Predicted reward and actual reward of 1st and 8th PILCO iterations