Please wait a minute...
浙江大学学报(工学版)  2019, Vol. 53 Issue (10): 1865-1873    DOI: 10.3785/j.issn.1008-973X.2019.10.003
机械与能源工程     
基于强化学习的机器人曲面恒力跟踪研究
张铁(),肖蒙,邹焱飚,肖佳栋
华南理工大学 机械与汽车工程学院,广东 广州 510640
Research on robot constant force control of surface tracking based on reinforcement learning
Tie ZHANG(),Meng XIAO,Yan-biao ZOU,Jia-dong XIAO
School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou 510640, China
 全文: PDF(1638 KB)   HTML
摘要:

针对机器人末端执行器和曲面工件接触时难以得到恒定接触力的问题,建立机器人末端执行器与曲面工件的接触模型.构建曲面接触力坐标系与机器人传感器测量坐标系之间的关系,利用基于概率动力学模型的强化学习(PILCO)算法对模型输出参数与接触状态的关系进行学习,对部分接触状态进行预测,强化学习根据预测的状态优化机器人位移输入参数,得到期望跟踪力信号. 实验中,将强化学习的输入状态改为一段时间内的状态平均值以减少接触状态下信号的干扰. 实验结果表明,利用PILCO算法在迭代8次后能够得到较稳定的力,相比于模糊迭代算法收敛速度较快,力误差绝对值的平均值减少了29%.

关键词: 机器人曲面跟踪力控制基于概率动力学模型的强化学习(PILCO)强化学习    
Abstract:

The contact model between robot end-effector and surface was established in order to solve the problem that it is difficult to obtain contact force when a robot end effector contacts with the curved workpiece. The relationship between the contact force coordinate system of the curved surface and the measuring coordinate system of the robot sensor was constructed. The relationship between the output parameters of the model and the contact state was analyzed based on probabilistic inference and learning for control (PILCO) which was a reinforcement learning algorithm based on a probabilistic dynamics model. The partial contact state was forecasted according to the output state, and the displacement input parameters of the robot were optimized to achieve a constant force by the reinforcement learning algorithm. The input state of the reinforcement learning was modified to an average state value over a period of time, which reduced the interference to the input state value during experiments. The experimental results showed that the algorithm obtained stable force after 8 iterations. The convergence speed was faster compared with the fuzzy iterative algorithm, and the average absolute value of the force error was reduced by 29%.

Key words: robot    contour tracking    force control    probabilistic inference and learning for control (PILCO)    reinforcement learning
收稿日期: 2018-08-02 出版日期: 2019-09-30
CLC:  TP 242  
作者简介: 张铁(1968—),男,博导, 从事工业机器人、服务机器人中的移动机器人的关键技术等研究. oricid.crg/0000-0001-9716-3970. E-mail: merobot@scut.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
张铁
肖蒙
邹焱飚
肖佳栋

引用本文:

张铁,肖蒙,邹焱飚,肖佳栋. 基于强化学习的机器人曲面恒力跟踪研究[J]. 浙江大学学报(工学版), 2019, 53(10): 1865-1873.

Tie ZHANG,Meng XIAO,Yan-biao ZOU,Jia-dong XIAO. Research on robot constant force control of surface tracking based on reinforcement learning. Journal of ZheJiang University (Engineering Science), 2019, 53(10): 1865-1873.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2019.10.003        http://www.zjujournals.com/eng/CN/Y2019/V53/I10/1865

图 1  机器人恒力跟踪实验平台
图 2  机器人末端局部图
图 3  机器人末端受力分析图
图 4  基于位置的显式力控制
图 5  基于策略的强化学习结构图
图 6  PILCO算法结构图
图 7  机器人恒力跟踪实验装置
图 8  被跟踪曲面工件
图 9  基于PILCO的恒力跟踪实验流程图
图 10  PICLO算法第1~4次迭代过程法
图 11  PICLO算法第5~8次迭代过程
迭代次数 力误差
f |max/N $\left|\Delta \bar f \right| $/N Δfs /N
1 16.442 7 1.707 9 3.108 1
2 10.184 9 1.134 9 1.758 9
3 11.647 8 0.944 8 1.510 0
4 12.292 2 0.958 1 1.604 6
5 12.397 4 1.120 5 1.793 6
6 10.817 4 1.085 5 1.619 9
7 11.984 2 0.890 6 1.501 7
8 11.395 5 0.855 4 1.440 8
表 1  PILCO迭代过程中力误差参数的对比
图 12  第1次和第8次迭代时PILCO预测的奖励函数值和实际的奖励函数值
图 13  PILCO算法和模糊迭代算法的结果对比图
算法 力误差
f |max/N $\left|\Delta \bar f \right| $/N Δfs /N
PILCO 11.395 5 0.855 4 1.440 8
模糊迭代 3.313 2 1.209 2 1.342 3
表 2  PILCO算法和模糊迭代算法的对比
1 ALICI G, SHIRINZADEH B Enhanced stiffness modeling, identification and characterization for robot manipulators[J]. IEEE Transactions on Robotics, 2005, 21 (4): 554- 564
doi: 10.1109/TRO.2004.842347
2 黄奇伟, 章明, 曲巍崴, 等 机器人制孔姿态优化与光顺[J]. 浙江大学学报: 工学版, 2015, 49 (12): 2261- 2268
HUANG Qi-wei, ZHANG Ming, QU Wei-wei, et al Posture optimization and smoothness for robot drilling[J]. Journal of Zhejiang University: Engineering Science, 2015, 49 (12): 2261- 2268
3 WINKLER A, SUCHY J. Force controlled contour following on unknown objects with an industrial robot [C]//IEEE International Symposium on Robotic and Sensors Environments (ROSE). Washington, DC: IEEE, 2013: 208-213.
4 TUNG P, FAN S Application of fuzzy on-line self-adaptive controller for a contour tracking robot on unknown contours[J]. Fuzzy Sets and Systems, 1996, 82 (1): 17- 25
doi: 10.1016/0165-0114(95)00272-3
5 ABU-MALLOUH M, SURGENOR B. Force/velocity control of a pneumatic gantry robot for contour tracking with neural network compensation [C]// ASME 2008 International Manufacturing Science and Engineering Conference. Evanston, Illinois, USA: ASME, 2008: 11-18.
6 LI E C, LI Z M Surface tracking with robot force control in unknown environment[J]. Advanced Materials Research, 2011, 328-330: 2140- 2143
doi: 10.4028/www.scientific.net/AMR.328-330
7 YE B S, SONG B, LI Z Y, et al A study of force and position tracking control for robot contact with an arbitrarily inclined plane[J]. International Journal of Advanced Robotic Systems, 2013, 10 (1): 1- 1
doi: 10.5772/52938
8 DUAN J J, GAN Y H, CHEN M, et al Adaptive variable impedance control for dynamic contact force tracking in uncertain environment[J]. Robotics and Autonomous Systems, 2018, 102: 54- 65
doi: 10.1016/j.robot.2018.01.009
9 NUCHKRUA T, CHEN S L Precision contouring control of five degree of freedom robot manipulators with uncertainty[J]. International Journal of Advanced Robotic Systems, 2017, 14 (1): 208- 213
10 WANG W C, LEE C H. Fuzzy neural network-based adaptive impedance force control design of robot manipulator under unknown environment [C]//IEEE International Conference on Fuzzy Systems. Beijing: IEEE, 2014: 1442-1448.
11 KOBER J, BAGNELL J A, PETERS J Reinforcement learning in robotics: a survey[J]. The International Journal of Robotics Research, 2013, 32 (11): 1238- 1274
doi: 10.1177/0278364913495721
12 NG A Y. Shaping and policy search in reinforcement learning [D]. California: University of California, Berkeley, 2003.
13 MüLLING K, KOBER J, KROEMER O, et al Learning to select and generalize striking movements in robot table tennis[J]. The International Journal of Robotics Research, 2013, 32 (3): 263- 279
doi: 10.1177/0278364912472380
14 HESTER T, QUINLAN M, STONE P. Generalized model learning for Reinforcement Learning on a humanoid robot [C]// IEEE International Conference on Robotics and Automation. Anchorage: IEEE, 2010: 2369-2374.
15 YEN G G, HICKEY T W Reinforcement learning algorithms for robotic navigation in dynamic environments[J]. ISA Transactions, 2004, 43 (2): 217- 230
doi: 10.1016/S0019-0578(07)60032-9
16 ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al Deep reinforcement learning: a brief survey[J]. IEEE Signal Processing Magazine, 2017, 34 (6): 26- 38
doi: 10.1109/MSP.2017.2743240
17 POLYDOROS A S, NALPANTIDIS L Survey of model-based reinforcement learning: applications on robotics[J]. Journal of Intelligent and Robotic Systems, 2017, 86 (2): 1- 21
18 DOERR A, TUONG N D, MARCO A, et al. Model-based policy search for automatic tuning of multivariate PID controllers [C]//2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore: IEEE, 2017: 5295-5301.
19 HAN H, GAUL P, MATSUBARA T. Model-based reinforcement learning approach for deformable linear object manipulation [C]//2017 13th IEEE Conference on Automation Science and Engineering (CASE). Xi'an: IEEE, 2017: 750-755.
20 ZENG G, HEMAMI A An overview of robot force control[J]. Robotica, 1997, 15 (5): 473- 482
doi: 10.1017/S026357479700057X
21 SHENG X, XU L, WANG Z A position-based explicit force control strategy based on online trajectory prediction[J]. International Journal of Robotics and Automation, 2017, 32 (1): 93- 100
22 VOLPE R, KHOSLA P A theoretical and experimental investigation of explicit force control strategies for manipulators[J]. IEEE Transactions on Automatic Control, 1993, 38 (11): 1634- 1650
doi: 10.1109/9.262033
23 KOMATI B, PAC M R, RANATUNGA I, et al. Explicit force control vs impedance control for micromanipulation [C]//ASME 2013 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Portland: ASME, 2013: V001T09A018.
24 DEISENROTH M P, RASMUSSEN C E. PILCO: a model-based and data-efficient approach to policy search [C]// International Conference on International Conference on Machine Learning. Bellevue: Omnipress, 2011: 465-472.
25 DEISENROTH M P. Efficient reinforcement learning using Gaussian processes [M]. Karlsruhe, Germany: KIT, 2010.
[1] 张铁,胡亮亮,邹焱飚. 基于混合遗传算法的机器人改进摩擦模型辨识[J]. 浙江大学学报(工学版), 2021, 55(5): 801-809.
[2] 李伟达,王柱,张虹淼,李娟,顾洪. 床式步态康复训练系统机构设计[J]. 浙江大学学报(工学版), 2021, 55(5): 823-830.
[3] 李伟达,李娟,李想,张虹淼,顾洪,史逸鹏,张浩杰,孙立宁. 欠驱动异构式下肢康复机器人动力学分析及参数优化[J]. 浙江大学学报(工学版), 2021, 55(2): 222-228.
[4] 郝天泽,肖华平,刘书海,张超,马豪. 集成化智能软体机器人研究进展[J]. 浙江大学学报(工学版), 2021, 55(2): 229-243.
[5] 刘明敏,曲道奎,徐方,邹风山,贾凯,宋吉来. 基于运动发散分量的四足机器人步态规划[J]. 浙江大学学报(工学版), 2021, 55(2): 244-250.
[6] 刘德斌,王旦,陈柏,王尧尧,宋立瑶. 外肢体机器人研究综述[J]. 浙江大学学报(工学版), 2021, 55(2): 251-258.
[7] 马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.
[8] 段有康,陈小刚,桂剑,马斌,李顺芬,宋志棠. 基于相位划分的下肢连续运动预测[J]. 浙江大学学报(工学版), 2021, 55(1): 89-95.
[9] 王进,王向坤,扶建辉,陆国栋,金超超,陈燕智. 重载机器人横梁结构静动态特性分析与优化[J]. 浙江大学学报(工学版), 2021, 55(1): 124-134.
[10] 董大钊,徐冠华,高继良,徐月同,傅建中. 基于机器视觉的机器人装配位姿在线校正算法[J]. 浙江大学学报(工学版), 2021, 55(1): 145-152.
[11] 郭英杰,顾钒,董辉跃,汪海晋. 压脚压紧力作用下的机器人变形预测和补偿[J]. 浙江大学学报(工学版), 2020, 54(8): 1457-1465.
[12] 冯毅雄,李康杰,高一聪,郑浩. 面向视觉伺服的工业机器人轮廓曲线角点识别[J]. 浙江大学学报(工学版), 2020, 54(8): 1449-1456.
[13] 徐建明,赵智鹏,董建伟. 连杆侧无传感器下机器人柔性关节系统的零力控制[J]. 浙江大学学报(工学版), 2020, 54(7): 1256-1263.
[14] 毛晨涛,陈章位,张翔,祖洪飞. 基于相对精度指标的机器人运动学校准[J]. 浙江大学学报(工学版), 2020, 54(7): 1316-1324.
[15] 张太恒,梅标,乔磊,杨浩杰,朱伟东. 纹理边界引导的复合材料圆孔检测方法[J]. 浙江大学学报(工学版), 2020, 54(12): 2294-2300.