Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (3): 599-610    DOI: 10.3785/j.issn.1008-973X.2024.03.017
    
Light-weight algorithm for real-time robotic grasp detection
Mingjun SONG(),Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU*()
1. College of Electrical Engineering, Sichuan University, Chengdu 610065, China
Download: HTML     PDF(4675KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A light-weight, real-time approach named RTGN (real-time grasp net) was proposed to improve the accuracy and speed of robotic grasp detection for novel objects of diverse shapes, types and sizes. Firstly, a multi-scale dilated convolution module was designed to construct a light-weight feature extraction backbone. Secondly, a mixed attention module was designed to help the network focus more on meaningful features. Finally, the pyramid pool module was deployed to fuse the multi-level features extracted by the network, thereby improving the capability of grasp perception to the object. On the Cornell grasping dataset, RTGN generated grasps at a speed of 142 frame per second and attained accuracy rates of 98.26% and 97.65% on image-wise and object-wise splits, respectively. In real-world robotic grasping experiments, RTGN obtained a success rate of 96.0% in 400 grasping attempts across 20 novel objects. Experimental results demonstrate that RTGN outperforms existing methods in both detection accuracy and detection speed. Furthermore, RTGN shows strong adaptability to variations in the position and pose of grasped objects, effectively generalizing to novel objects of diverse shapes, types and sizes.



Key wordsrobotic grasping      grasp detection      attention mechanism      convolutional neural networks      deep learning      unstructured environment     
Received: 22 August 2023      Published: 05 March 2024
CLC:  TP 242  
Fund:  国家自然科学基金资助项目(12126606);四川省科技计划资助项目(23ZDYF2913);德阳科技(揭榜)资助项目(2021JBJZ007);智能电网四川省重点实验室应急重点资助项目(020IEPG-KL-20YJ01).
Corresponding Authors: Haiyan TU     E-mail: mingjun_s@foxmail.com;haiyantu@163.com
Cite this article:

Mingjun SONG,Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU. Light-weight algorithm for real-time robotic grasp detection. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 599-610.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.03.017     OR     https://www.zjujournals.com/eng/Y2024/V58/I3/599


轻量化机器人抓取位姿实时检测算法

针对机器人对形状、大小、种类变化不一的未知物体的抓取,提出轻量化的抓取位姿实时检测算法RTGN,以进一步提升抓取检测准确率及检测速度. 设计多尺度空洞卷积模块,以构建轻量化的特征提取主干网络;设计混合注意力模块,以加强网络对重要抓取特征的关注;引入金字塔池化模块融合多层级特征,以提升网络对物体的抓取感知能力. 在Cornell抓取数据集上进行测试,RTGN检测速度为142帧/s,在图像拆分和对象拆分上的检测准确率分别为98. 26%和97. 65%;在实际抓取环境下进行抓取实验,机器人对20类未知物体进行400次抓取,抓取成功率为96. 0%. 实验结果表明,RTGN的检测准确率和检测速度较现有方法有明显提升,对物体的位置和姿态变化具有较强的适应性,并且能够有效地泛化到形状、大小、种类等变化不一的未知物体的抓取检测中.


关键词: 机器人抓取,  抓取检测,  注意力机制,  卷积神经网络,  深度学习,  非结构化环境 
Fig.1 Schematic diagram of five-dimensional grasp representation[6]
Fig.2 Four-dimensional grasp representation using heatmaps[16]
Fig.3 Overview architecture of RTGN grasp detection algorithm
Fig.4 Structure of multi-scale dilated convolution module
Fig.5 3×3 dilated convolution kernels at different dilation rates
Fig.6 Structure of channel attention module
Fig.7 Structure of coordinate attention module
Fig.8 Structure of mixed attention module
Fig.9 Structure of pyramid pool module
Fig.10 Structure of predict head
方法A/%v/ms
Image-wiseObject-wise
Jiang 等[5]60. 5058. 305000
Lenz 等[6]73. 9075. 601350
Redmon 等[10]88. 0087. 1076
Kumra 等[11]89. 2188. 96103
Guo 等[13]93. 2089. 10
Chu 等[14]96. 0096. 10120
Zhou 等[15]97. 7496. 61118
夏晶等[8]93. 8091. 3057
喻群超等[9]94. 1093. 30
张云洲等[7]95. 7194. 0117
Morrison 等[16]73. 0069. 0019
Kumra 等[17]97. 7096. 6020
Cheng 等[18]98. 0097. 0073
Wang 等[19]97. 9996. 7041. 6
RTGN98. 2697. 657
Tab.1 Comparison results of different algorithms on Cornell grasping dataset
Fig.11 Visualization results of grasping detection on Cornell grasping dataset predicted by RTGN
Fig.12 Incomplete labelled ground truth of Cornell grasping dataset
网络架构A/%v/ms
Image-wiseObject-wise
MDM-Backbone97. 7396. 805. 29
+CBAM97. 86(+0. 13)96. 90(+0. 10)6. 42
+MAM97. 91(+0. 18)97. 00(+0. 20)6. 60
+PPM97. 95(+0. 22)97. 03(+0. 23)5. 64
+CBAM+PPM98. 08(+0. 35)97. 18(+0. 38)6. 74
+MAM+PPM98. 26(+0. 53)97. 65(+0. 85)6. 96
Tab.2 Ablation experiments on Cornell grasping dataset
方法A/%v/msP/MF/G
Image-wiseObject-wise
Kumra 等[11]89. 2188. 96103>32
Chu 等[14]96. 0096. 1012028. 18
Zhou 等[15]97. 7496. 61118>30
Morrison 等 [16]73. 0069. 00190. 062
夏晶等[8]93. 8091. 3057>46
张云洲等[7]95. 7194. 0117>12
RTGN98. 2697. 6571. 6608. 00
Tab.3 Comparison results of network performance and size for different methods
Fig.13 Comparison results of RTGN and TF-Grasp[19] on grasping detection for single novel object
Fig.14 Visualization results of grasping detection for multiple novel objects predicted by RTGN
Fig.15 Physical platform of robotic grasping experiment
Fig.16 Objects used in robotic grasping experiment
Fig.17 Robotic grasping of novel objects
物体As物体As
桔子100% (20/20)糖果100% (20/20)
饼干100% (20/20)塑料盘100% (20/20)
鼠标85% (17/20)塑料碗95% (19/20)
纸杯90% (18/20)雨伞80% (16/20)
酒精喷雾瓶90% (18/20)胶布100% (20/20)
五号电池100% (20/20)圆柱积木100% (20/20)
螺丝刀100% (20/20)牛奶盒95% (19/20)
牙膏盒100% (20/20)牙膏90% (18/20)
洗衣液瓶100% (20/20)刷子100% (20/20)
洗面奶95% (19/20)化妆水瓶100% (20/20)
Tab.4 Statistic results of robotic grasping experiment
[1]   刘亚欣, 王斯瑶, 姚玉峰, 等 机器人抓取检测技术的研究现状[J]. 控制与决策, 2020, 35 (12): 2817- 2828
LIU Yaxin, WANG Siyao, YAO Yufeng, et al Recent researches on robot autonomous grasp technology[J]. Control and Decision, 2020, 35 (12): 2817- 2828
[2]   BOHG J, MORALES A, ASFOUR T, et al Data-driven grasp synthesis: a survey[J]. IEEE Transactions on Robotics, 2014, 30 (2): 289- 309
doi: 10.1109/TRO.2013.2289018
[3]   仲训杲, 徐敏, 仲训昱, 等 基于多模特征深度学习的机器人抓取判别方法[J]. 自动化学报, 2016, 42 (7): 1022- 1029
ZHONG Xungao, XU Min, ZHONG Xunyu, et al Multimodal features deep learning for robotic potential grasp recognition[J]. Acta Automatica Sinica, 2016, 42 (7): 1022- 1029
[4]   杜学丹, 蔡莹皓, 鲁涛, 等 一种基于深度学习的机械臂抓取方法[J]. 机器人, 2017, 39 (6): 820- 828
DU Xuedan, CAI Yinghao, LU Tao, et al A robotic grasping method based on deep learning[J]. Robot, 2017, 39 (6): 820- 828
[5]   JIANG Y, MOSESON S, SAXENA A. Efficient grasping from RGBD images: learning using a new rectangle representation [C]// IEEE International Conference on Robotics and Automation . Shanghai: IEEE, 2011: 3304−3311.
[6]   LENZ I, LEE H, SAXENA A Deep learning for detecting robotic grasps[J]. The International Journal of Robotics Research, 2015, 34 (4/5): 705- 724
[7]   张云洲, 李奇, 曹赫, 等 基于多层级特征的机械臂单阶段抓取位姿检测[J]. 控制与决策, 2021, 36 (8): 1815- 1824
ZHANG Yunzhou, LI Qi, CAO He, et al Single-stage grasp pose detection of manipulator based on multi-level features[J]. Control and Decision, 2021, 36 (8): 1815- 1824
[8]   夏晶, 钱堃, 马旭东, 等 基于级联卷积神经网络的机器人平面抓取位姿快速检测[J]. 机器人, 2018, 40 (6): 794- 802
XIA Jing, QIAN Kun, MA Xudong, et al Fast planar grasp pose detection for robot based on cascaded deep convolutional neural networks[J]. Robot, 2018, 40 (6): 794- 802
[9]   喻群超, 尚伟伟, 张驰 基于三级卷积神经网络的物体抓取检测[J]. 机器人, 2018, 40 (5): 762- 768
YU Qunchao, SHANG Weiwei, ZHANG Chi Object grasp detecting based on three-level convolution neural network[J]. Robot, 2018, 40 (5): 762- 768
[10]   REDMON J, ANGELOVA A. Real-time grasp detection using convolutional neural networks [C]// IEEE International Conference on Robotics and Automation . Seattle: IEEE, 2015: 1316−1322.
[11]   KUMRA S, KANAN C. Robotic grasp detection using deep convolutional neural networks [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems . Vancouver: IEEE, 2017: 769−776.
[12]   HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Las Vegas: IEEE, 2016: 770−778.
[13]   GUO D, SUN F C, LIU H P, et al. A hybrid deep architecture for robotic grasp detection [C]// IEEE International Conference on Robotics and Automation . Singapore: IEEE, 2017: 1609−1614.
[14]   CHU F J, XU R N, VELA P A Real-world multiobject, multigrasp detection[J]. IEEE Robotics and Automation Letters, 2018, 3 (4): 3355- 3362
doi: 10.1109/LRA.2018.2852777
[15]   ZHOU X W, LAN X G, ZHANG H B, et al. Fully convolutional grasp detection network with oriented anchor box [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems . Madrid: IEEE, 2018: 7223−7230.
[16]   MORRISON D, CORKE P, LEITNER J. Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach [EB/OL]. (2018−05−15) [2023−02−06]. https://arxiv.org/abs/1804.05172v2.
[17]   KUMRA S, JOSHI S, SAHIN F. Antipodal robotic grasping using generative residual convolutional neural network [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems . Las Vegas: IEEE, 2020: 9626−9633.
[18]   CHENG H, WANG Y Y, MENG Max Q H. Grasp pose detection from a single RGB image [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems . Prague: IEEE, 2021: 4686−4691.
[19]   WANG S C, ZHOU Z L, KAN Z When transformer meets robotic grasping: exploits context for efficient grasp detection[J]. IEEE Robotics and Automation Letters, 2022, 7 (3): 8170- 8177
doi: 10.1109/LRA.2022.3187261
[20]   ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network [C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Honolulu: IEEE, 2017: 6230−6239.
[21]   WANG P Q, CHEN P F, YUAN Y, et al. Understanding convolution for semantic segmentation [C]// IEEE Winter Conference on Applications of Computer Vision (WACV) . Lake Tahoe: IEEE, 2018: 1451−1460.
[22]   SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15 (1): 1929- 1958
[23]   WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision (ECCV) . Munich: Springer, 2018: 3−19.
[24]   HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville: IEEE, 2021: 13708−13717.
[1] Xinhua YAO,Tao YU,Senwen FENG,Zijian MA,Congcong LUAN,Hongyao SHEN. Recognition method of parts machining features based on graph neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 349-359.
[2] Siyi QIN,Shaoyan GAI,Feipeng DA. Video object detection algorithm based on multi-level feature aggregation under mixed sampler[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 10-19.
[3] Xuefei SUN,Ruifeng ZHANG,Xin GUAN,Qiang LI. Lightweight and efficient human pose estimation with enhanced priori skeleton structure[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 50-60.
[4] Chao-hao ZHENG,Zhi-wei YIN,Gang-feng ZENG,Yue-ping XU,Peng ZHOU,Li LIU. Post-processing of numerical precipitation forecast based on spatial-temporal deep learning model[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1756-1765.
[5] Hai-feng LI,Xue-ying ZHANG,Shu-fei DUAN,Hai-rong JIA,Hui-zhi LIANG. Fusing generative adversarial network and temporal convolutional network for Mandarin emotion recognition[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1865-1875.
[6] Xiao-qiang ZHAO,Ze WANG,Zhao-yang SONG,Hong-mei JIANG. Image super-resolution reconstruction based on dynamic attention network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1487-1494.
[7] Hui-xin WANG,Xiang-rong TONG. Research progress of recommendation system based on knowledge graph[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1527-1540.
[8] Xiu-lan SONG,Zhao-hang DONG,Hang-guan SHAN,Wei-jie LU. Vehicle trajectory prediction based on temporal-spatial multi-head attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1636-1643.
[9] Xiao-yan LI,Peng WANG,Jia GUO,Xue LI,Meng-yu SUN. Multi branch Siamese network target tracking based on double attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1307-1316.
[10] Zhe YANG,Hong-wei GE,Ting LI. Framework of feature fusion and distribution with mixture of experts for parallel recommendation algorithm[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1317-1325.
[11] Yun-hong LI,Jiao-jiao DUAN,Xue-ping SU,Lei-tao ZHANG,Hui-kang YU,Xing-rui LIU. Calligraphy generation algorithm based on improved generative adversarial network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1326-1334.
[12] Wei QUAN,Yong-qing CAI,Chao WANG,Jia SONG,Hong-kai SUN,Lin-xuan LI. VR sickness estimation model based on 3D-ResNet two-stream network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1345-1353.
[13] Xin-lei ZHOU,Hai-ting GU,Jing LIU,Yue-ping XU,Fang GENG,Chong WANG. Daily water supply prediction method based on integrated learning and deep learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1120-1127.
[14] Pei-feng LIU,Lu QIAN,Xing-wei ZHAO,Bo TAO. Continual learning framework of named entity recognition in aviation assembly domain[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1186-1194.
[15] Jun HAN,Xiao-ping YUAN,Zhun WANG,Ye CHEN. UAV dense small target detection algorithm based on YOLOv5s[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1224-1233.