Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (11): 2179-2187    DOI: 10.3785/j.issn.1008-973X.2023.11.005
    
Binocular vision object 6D pose estimation based on circulatory neural network
Heng YANG1(),Zhuo LI1,Zhong-yuan KANG2,Bing TIAN1,Qing DONG1
1. College of Mechanical Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China
2. College of Mechanical Engineering, Chongqing Agricultural Mechanization School, Chongqing 402160, China
Download: HTML     PDF(1068KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A method for creating binocular dataset and a 6D pose estimation network called Binocular-RNN were proposed, in response to the problem of low accuracy in the current task of 6D pose estimation for objects. The existing images in the YCB-Video Dataset were used as the content captured by the left camera of the binocular system. The corresponding 3D object models in the YCB-Video Dataset were imported using Open GL, and the parameters related to each object were input to generate synthetic images captured by the virtual right camera of the binocular system. A monocular prediction network was utilized in the Binocular-RNN to extract geometric features from the left and right images in the binocular dataset, and recurrent neural network was used to fuse these geometric features and predict the 6D pose of the objects. The evaluation of Binocular-RNN and other pose estimation methods was based on the average distance of model points (ADD), average nearest point distance (ADDS), translation error and angle error. The results show that when the network was trained on a single object, the ADD or ADDS score of Binocular-RNN was 2.66 times that of PoseCNN and 1.15 times that of GDR-Net. Furthermore, the Binocular-RNN trained by the physics-based real-time rendering (Real+PBR) outperformed the DeepIM method based on deep neural network iterative 6D pose matching.



Key words6D pose      monocular vision      active vision      circulatory neural network      YCB-Video Dataset     
Received: 20 November 2022      Published: 11 December 2023
CLC:  TP 391.4  
Cite this article:

Heng YANG,Zhuo LI,Zhong-yuan KANG,Bing TIAN,Qing DONG. Binocular vision object 6D pose estimation based on circulatory neural network. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2179-2187.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.11.005     OR     https://www.zjujournals.com/eng/Y2023/V57/I11/2179


基于循环神经网络的双目视觉物体6D位姿估计

针对当前物体6D位姿估计任务准确率较低的问题,提出双目数据集制作方法及物体6D位姿估计网络Binocular-RNN. 将YCB-Video Dataset中已有图像作为双目相机左摄像头捕获内容,利用Open GL将YCB-Video Dataset中相应三维物体模型进行导入,输入各物体相关参数,由虚拟双目相机右摄像头捕获合成图片. 利用单目预测网络分别对双目数据集中左、右图像的几何特征进行提取. 经过循环神经网络对几何特征进行融合,并预测物体6D位姿. 以模型点平均距离(ADD)、平均最近点距离(ADDS)、平移误差和角度误差作为评价指标,对Binocular-RNN与其他位姿估计方法进行对比. 结果表明,在利用单一物体对网络进行训练时,Binocular-RNN 的ADD或ADDS指标得分分别为PoseCNN、GDR-Net的2.66、1.15倍. 利用基于物理的实时渲染(Real+PBR)方式训练的Binocular-RNN的性能超过基于深度神经网络的迭代6D姿态匹配的方法(DeepIM).


关键词: 6D位姿,  单目视觉,  主动视觉,  循环神经网络,  YCB-Video数据集 
Fig.1 Production process of binocular data set
Fig.2 Binocular-RNN overall framework
Fig.3 Effect of MSRA quantity on model prediction results
Fig.4 Effect of PnP variant on Synthetic Sphere dataset
方法 Ref m Acc(ADD(S))/
%
AUC/% ts/ms
ADDS ADD(S)
Only-RNN 1
Only-CNN 1 18.4 62.3 59.6 35
Only-CNN M 15.6
Binocular-RNN 1 56.7 90.8 85.2 23
Binocular-RNN M 70.5 93.4 89.6
PoseCNN[19] 1 21.3 75.9 61.3 24
GDR-Net 1 49.1 89.1 80.2 22
GDR-Net M 60.1 91.6 84.4
Single-Stage[16] M 53.9
DeepIM[19] 1 88.1 81.9 25
CosyPose[20] 1 89.8 84.5 25
Tab.1 Comparison of Binocular-RNN with other methods on YCB-Video Dataset
方法 m 训练数据 模型 MEAN
Ape Can Cat Driller Duck Eggbox Glue Holep
PoseCNN 1 Real+syn 9.6 45.2 0.9 41.4 19.6 22.0 38.5 22.1 24.9
PVNet M Real+syn 15.8 63.3 16.7 65.7 25.2 50.2 49.6 36.1 40.8
Single-Stage M Real+syn 19.2 65.1 18.9 69.0 25.3 52.0 51.4 45.6 43.3
GDR-NET M Real+syn 41.3 71.1 18.2 54.6 41.7 40.2 59.5 52.6 47.4
Binocular-RNN 1 Real+syn 49.6 78.2 40.3 67.4 50.6 45.4 60.5 68.2 57.5
M Real+syn 41.3 79.1 42.8 71.2 55.3 48.3 65.7 70.5 51.6
1 Real+pbr 48.6 82.3 51.4 73.5 61.2 58.3 70.5 72.6 64.8
M Real+pbr 50.4 85.7 58.3 76.2 68.3 62.1 75.8 74.2 68.9
DPOD 1 Real+syn 47.3
DeepIM 1 Real+syn 59.2 63.5 26.2 55.6 52.4 63.0 71.7 52.5 55.5
Tab.2 Comparison of Binocular-RNN with other methods on LM-O %
Fig.5 Effect of distance on accuracy
[1]   SUNDERMEYER M, MARTON Z C, DURNER M, et al Augmented autoencoders: implicit 3D orientation learning for 6D object detection[J]. International Journal of Computer Vision, 2020, 128: 714- 729
doi: 10.1007/s11263-019-01243-8
[2]   WADIM K , FABIAN M , FEDERICO T, et al. SSD-6D: making RGB-based 3D detection and 6D pose estimation great again [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 1521–1529.
[3]   DI Y, MANHARDT F, WANG G, et al. SO-POSE: exploiting self-occlusion for direct 6D pose estimation [C]// International Conference on Computer Vision. Montreal: IEEE, 2021: 12396-12405.
[4]   WANG G, MANHARDT F, TOMBARI F, et al. GDR-NET: geometry-guided direct regression network for monocular 6D object pose estimation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Montreal: IEEE, 2021: 1661–16621.
[5]   刘城. 手腕型操作中操作者手部的主动视觉追踪方法研究[D]. 武汉: 武汉理工大学, 2020.
LIU Cheng. Research on active vision tracking method of operator’s hand in wrist-type operation[D]. Wuhan: Wuhan University of Technology, 2020.
[6]   戈振鹏. 基于主动视觉和强化学习的机械臂装配研究[D]. 成都: 电子科技大学, 2022.
GE Zhen-peng. Research on robotic assembly based on active vision and reinforcement learning[D]. Chengdu: University of Electronic Technology, 2022.
[7]   ZHOU Y, BARNES C, LU J W, et al. On the continuity of rotation representations in neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3: 5745-5753.
[8]   KUNDU A, LI Y, REHG J M. 3D-RCNN: instance-level 3D object reconstruction via render and compare [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 3: 3559–3568.
[9]   LI Z G, WANG G, JI X Y. CDPN: coordinates-based disentangled pose network for real time rgb-based 6-DoF object pose estimation [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019. 1: 7678–7687.
[10]   VOLODYMYR M, NICOLAS H, ALEX G, et al. Recurrent models of visual attention[C]// Conference and Workshop on Neural Information Processing Systems. Montreal: [s.n.], 2014: 136–145.
[11]   JAMIE S, BEN G, CHRISTOPHER Z, et al. Scene coordinate regression forests for camera relocalization in RGB-D images[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 2930-2937.
[12]   ADAM P, SAM G, FRANCISCO M, et al. Pytorch: an imperative style, high-performance deep learning library [C]// Advances in Neural Information Processing Systems. [s.l.]: Manning publications, 2019, 5: 8026–8037.
[13]   LIU L Y, JIANG H M, HE P C, et al. On the variance of the adaptive learning rate and beyond [C]// International Conference on Learning Representations. Vancouver: [s.n.], 2020.
[14]   WAIL M, NICOLAS P, NORBERT K. Multi-view object recognition using view-point invariant shape relations and appearance information [C]// IEEE International Conference on Robotics and Automation. Guangzhou: [s.n.], 2013: 4230–4237.
[15]   LI Y, WANG G, JI X Y, et al. Deepim: deep iterative matching for 6D pose estimation[C]// Proceedings of the European Conference on Computer Vision. Munich: [s.n.], 2018: 683-698.
[16]   HU Y L, PASCAL F, WANG W, et al. Single-stage 6D object pose estimation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Detroit: IEEE, 2020: 2930-2939.
[17]   VINCENT L, FRANCESC M N, PASCAL F. EPnP: an accurate O(N) solution to the PnP problem[J]. International Journal of Computer Vision, 2009, (81): 155–166.
[18]   吕成志. 面向复杂场景的目标六自由度姿态估计关键技术研究[D]. 广州: 华南理工大学, 2020.
LV Cheng-zhi. Research on key technologies of object six degree of freedom pose estimation for complex scenes[D]. Guangzhou: South China University of Technology, 2020.
[19]   YU X, TANNER S, VENKATRAMAN N, et al. POSECNN: a convolutional neural network for 6D object pose estimation in cluttered scenes [C]// Robotics: Science and Systems. Pittsburgh: [s.n.], 2018.
[20]   YANN L, JUSTIN C, MATHIEU A, et al. Cosypose: consistent multi-view multi-object 6D pose estimation [C]// European Conference on Computer Vision. Glasgow: [s.n.], 2020: 574–591.
[21]   刘建伟, 宋志妍 循环神经网络研究综述[J]. 控制与决策, 2022, 37 (11): 2753- 2768
LIU Jian-wei, SONG Zhi-yan Research on recurrent neural network[J]. Journal of Computer Applications, 2022, 37 (11): 2753- 2768
[22]   NIALL M, JESUS M D R, PAUL M. Recurrent convolutional network for video-based person re-identification [C]// IEEE International Conference on Computer Vision. Las Vegas: IEEE, 2016: 1325-1334.
[23]   胡代弟, 李锐君. 基于主动视觉的机械表面疲劳损伤裂纹检测[J]. 制造业自动化, 2022, 44(5): 170-174.
HU Dai-di, LI Rui-jun. Mechanical surface fatigue damage crack detection based on active vision[J]. Manufacturing Automation. 2022, 44(5) : 170-174.
[24]   罗宇. 基于深度学习的目标姿态估计与机械臂抓取研究[D]. 广州: 广东工业大学, 2020.
LUO Yu. Research on target attitude estimation and manipulator grab based on deep learning[D]. Guangzhou: Guangdong University of Technology, 2020.
[1] Zhi-wei LIN,Qi-min LI,Xian-yu WANG. Monocular visual odometry based on improved Census transform[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1500-1509.
[2] ZHAO Li-ke, ZHENG Shun-yi, WANG Xiao-nan, HUANG Xia. Rigid object position and orientation measurement based on monocular sequence[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(12): 2372-2381.
[3] ZHENG Chi, XIANG Zhi-yu, LIU Ji-lin. Monocular vision odometry based on the fusion of optical flow and feature points matching[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(2): 279-284.