基于简化概率选择框架的双足机器人模仿学习
薛雯,赵硕,李永强

Imitation learning for bipedal robots based on simplified probabilistic framework for options
Wen XUE,Shuo ZHAO,Yongqiang LI
表 2 在典型连续动作控制任务中不同模仿学习方法的平均回报与标准差
Tab.2 Mean return and standard deviation of different imitation learning methods in typical continuous action control tasks
方法$ \overline{R}\pm \sigma $
类人机器
人行走
机械臂推物二维行走
机器人
游动机器人双足行走
机器人
类人机器
人起身
四足蚂蚁
机器人
单腿跳跃
机器人
双足猎豹
模型
GAIL-Option1701.46±
1109.85
−636.24 ±
239.12
509.71 ±
132.52
12.38 ±
41.99
257.79 ±
10.60
360.16 ±
198.64
1958.55 ±
226.74
489.35 ±
74.15
1924.50 ±
217.21
SPFFO−158.63 ±
48.26
−576.64 ±
33.26
1217.23 ±
756.50
−3.97 ±
7.60
−1.74 ±
7.84
275.29 ±
40.56
1027.70 ±
256.64
553.32 ±
279.83
2390.93 ±
358.77
PFFO498.22 ±
28.01
−391.40 ±
14.19
333.87 ±
132.80
41.52 ±
6.75
203.25 ±
86.42
885.02 ±
25.37
2251.33 ±
214.70
329.40 ±
194.13
1898.57 ±
574.98
本研究517.44 ±
46.11
−384.08 ±
24.51
2296.74 ±
250.19
45.94 ±
1.95
264.40 ±
16.82
915.27 ±
48.67
2407.19 ±
106.10
704.11 ±
61.90
2871.36 ±
155.93
DVL153.56 ±
63.20
−44.11 ±
0.12
192.49 ±
175.22
31.01 ±
3.05
−95.51 ±
18.47
812.97 ±
298.35
2010.40 ±
684.32
76.29 ±
43.06
340.72 ±
326.62
ISWBC−536.76 ±
594.76
−387.42 ±
13.04
1593.60 ±
446.14
42.98 ±
5.65
256.99 ±
31.92
−173.89 ±
146.13
2412.23 ±
72.90
488.13 ±
79.01
2738.28 ±
103.02
HIPS531.89 ±
34.90
−384.13 ±
17.24
1292.01 ±
189.64
25.08 ±
16.21
244.24 ±
29.73
865.51 ±
18.63
2338.80 ±
88.94
431.03 ±
69.39
1699.79 ±
187.79