基于简化概率选择框架的双足机器人模仿学习
|
|
薛雯,赵硕,李永强
|
Imitation learning for bipedal robots based on simplified probabilistic framework for options
|
|
Wen XUE,Shuo ZHAO,Yongqiang LI
|
|
| 表 2 在典型连续动作控制任务中不同模仿学习方法的平均回报与标准差 |
| Tab.2 Mean return and standard deviation of different imitation learning methods in typical continuous action control tasks |
|
| 方法 | $ \overline{R}\pm \sigma $ | 类人机器 人行走 | 机械臂推物 | 二维行走 机器人 | 游动机器人 | 双足行走 机器人 | 类人机器 人起身 | 四足蚂蚁 机器人 | 单腿跳跃 机器人 | 双足猎豹 模型 | | GAIL-Option | −1701.46± 1109.85 | −636.24 ± 239.12 | 509.71 ± 132.52 | 12.38 ± 41.99 | 257.79 ± 10.60 | 360.16 ± 198.64 | 1958.55 ± 226.74 | 489.35 ± 74.15 | 1924.50 ± 217.21 | | SPFFO | −158.63 ± 48.26 | −576.64 ± 33.26 | 1217.23 ± 756.50 | −3.97 ± 7.60 | −1.74 ± 7.84 | 275.29 ± 40.56 | 1027.70 ± 256.64 | 553.32 ± 279.83 | 2390.93 ± 358.77 | | PFFO | 498.22 ± 28.01 | −391.40 ± 14.19 | 333.87 ± 132.80 | 41.52 ± 6.75 | 203.25 ± 86.42 | 885.02 ± 25.37 | 2251.33 ± 214.70 | 329.40 ± 194.13 | 1898.57 ± 574.98 | | 本研究 | 517.44 ± 46.11 | −384.08 ± 24.51 | 2296.74 ± 250.19 | 45.94 ± 1.95 | 264.40 ± 16.82 | 915.27 ± 48.67 | 2407.19 ± 106.10 | 704.11 ± 61.90 | 2871.36 ± 155.93 | | DVL | 153.56 ± 63.20 | −44.11 ± 0.12 | 192.49 ± 175.22 | 31.01 ± 3.05 | −95.51 ± 18.47 | 812.97 ± 298.35 | 2010.40 ± 684.32 | 76.29 ± 43.06 | 340.72 ± 326.62 | | ISWBC | −536.76 ± 594.76 | −387.42 ± 13.04 | 1593.60 ± 446.14 | 42.98 ± 5.65 | 256.99 ± 31.92 | −173.89 ± 146.13 | 2412.23 ± 72.90 | 488.13 ± 79.01 | 2738.28 ± 103.02 | | HIPS | 531.89 ± 34.90 | −384.13 ± 17.24 | 1292.01 ± 189.64 | 25.08 ± 16.21 | 244.24 ± 29.73 | 865.51 ± 18.63 | 2338.80 ± 88.94 | 431.03 ± 69.39 | 1699.79 ± 187.79 |
|
|
|