Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (11): 2179-2187    DOI: 10.3785/j.issn.1008-973X.2023.11.005
机械工程     
基于循环神经网络的双目视觉物体6D位姿估计
杨恒1(),李卓1,康忠元2,田兵1,董青1
1. 太原科技大学 机械工程学院,山西 太原 030024
2. 重庆市农业机械化学校 机械工程学院,重庆 402160
Binocular vision object 6D pose estimation based on circulatory neural network
Heng YANG1(),Zhuo LI1,Zhong-yuan KANG2,Bing TIAN1,Qing DONG1
1. College of Mechanical Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China
2. College of Mechanical Engineering, Chongqing Agricultural Mechanization School, Chongqing 402160, China
 全文: PDF(1068 KB)   HTML
摘要:

针对当前物体6D位姿估计任务准确率较低的问题,提出双目数据集制作方法及物体6D位姿估计网络Binocular-RNN. 将YCB-Video Dataset中已有图像作为双目相机左摄像头捕获内容,利用Open GL将YCB-Video Dataset中相应三维物体模型进行导入,输入各物体相关参数,由虚拟双目相机右摄像头捕获合成图片. 利用单目预测网络分别对双目数据集中左、右图像的几何特征进行提取. 经过循环神经网络对几何特征进行融合,并预测物体6D位姿. 以模型点平均距离(ADD)、平均最近点距离(ADDS)、平移误差和角度误差作为评价指标,对Binocular-RNN与其他位姿估计方法进行对比. 结果表明,在利用单一物体对网络进行训练时,Binocular-RNN 的ADD或ADDS指标得分分别为PoseCNN、GDR-Net的2.66、1.15倍. 利用基于物理的实时渲染(Real+PBR)方式训练的Binocular-RNN的性能超过基于深度神经网络的迭代6D姿态匹配的方法(DeepIM).

关键词: 6D位姿单目视觉主动视觉循环神经网络YCB-Video数据集    
Abstract:

A method for creating binocular dataset and a 6D pose estimation network called Binocular-RNN were proposed, in response to the problem of low accuracy in the current task of 6D pose estimation for objects. The existing images in the YCB-Video Dataset were used as the content captured by the left camera of the binocular system. The corresponding 3D object models in the YCB-Video Dataset were imported using Open GL, and the parameters related to each object were input to generate synthetic images captured by the virtual right camera of the binocular system. A monocular prediction network was utilized in the Binocular-RNN to extract geometric features from the left and right images in the binocular dataset, and recurrent neural network was used to fuse these geometric features and predict the 6D pose of the objects. The evaluation of Binocular-RNN and other pose estimation methods was based on the average distance of model points (ADD), average nearest point distance (ADDS), translation error and angle error. The results show that when the network was trained on a single object, the ADD or ADDS score of Binocular-RNN was 2.66 times that of PoseCNN and 1.15 times that of GDR-Net. Furthermore, the Binocular-RNN trained by the physics-based real-time rendering (Real+PBR) outperformed the DeepIM method based on deep neural network iterative 6D pose matching.

Key words: 6D pose    monocular vision    active vision    circulatory neural network    YCB-Video Dataset
收稿日期: 2022-11-20 出版日期: 2023-12-11
CLC:  TP 391.4  
作者简介: 杨恒(1982—),男,副教授,博士,从事智能机械设备研究. orcid.org/0009-0004-1920-8677. E-mail: 93328173@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
杨恒
李卓
康忠元
田兵
董青

引用本文:

杨恒,李卓,康忠元,田兵,董青. 基于循环神经网络的双目视觉物体6D位姿估计[J]. 浙江大学学报(工学版), 2023, 57(11): 2179-2187.

Heng YANG,Zhuo LI,Zhong-yuan KANG,Bing TIAN,Qing DONG. Binocular vision object 6D pose estimation based on circulatory neural network. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2179-2187.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.11.005        https://www.zjujournals.com/eng/CN/Y2023/V57/I11/2179

图 1  双目数据集制作流程
图 2  Binocular-RNN整体框架
图 3  MSRA数量对模型预测结果的影响
图 4  PnP变体在Synthetic Sphere数据集上的影响
方法 Ref m Acc(ADD(S))/
%
AUC/% ts/ms
ADDS ADD(S)
Only-RNN 1
Only-CNN 1 18.4 62.3 59.6 35
Only-CNN M 15.6
Binocular-RNN 1 56.7 90.8 85.2 23
Binocular-RNN M 70.5 93.4 89.6
PoseCNN[19] 1 21.3 75.9 61.3 24
GDR-Net 1 49.1 89.1 80.2 22
GDR-Net M 60.1 91.6 84.4
Single-Stage[16] M 53.9
DeepIM[19] 1 88.1 81.9 25
CosyPose[20] 1 89.8 84.5 25
表 1  Binocular-RNN与其他方法在YCB-Video Dataset上的比较
方法 m 训练数据 模型 MEAN
Ape Can Cat Driller Duck Eggbox Glue Holep
PoseCNN 1 Real+syn 9.6 45.2 0.9 41.4 19.6 22.0 38.5 22.1 24.9
PVNet M Real+syn 15.8 63.3 16.7 65.7 25.2 50.2 49.6 36.1 40.8
Single-Stage M Real+syn 19.2 65.1 18.9 69.0 25.3 52.0 51.4 45.6 43.3
GDR-NET M Real+syn 41.3 71.1 18.2 54.6 41.7 40.2 59.5 52.6 47.4
Binocular-RNN 1 Real+syn 49.6 78.2 40.3 67.4 50.6 45.4 60.5 68.2 57.5
M Real+syn 41.3 79.1 42.8 71.2 55.3 48.3 65.7 70.5 51.6
1 Real+pbr 48.6 82.3 51.4 73.5 61.2 58.3 70.5 72.6 64.8
M Real+pbr 50.4 85.7 58.3 76.2 68.3 62.1 75.8 74.2 68.9
DPOD 1 Real+syn 47.3
DeepIM 1 Real+syn 59.2 63.5 26.2 55.6 52.4 63.0 71.7 52.5 55.5
表 2  Binocular-RNN与其他方法在LM-O上的精确度比较
图 5  物距对精确度的影响
1 SUNDERMEYER M, MARTON Z C, DURNER M, et al Augmented autoencoders: implicit 3D orientation learning for 6D object detection[J]. International Journal of Computer Vision, 2020, 128: 714- 729
doi: 10.1007/s11263-019-01243-8
2 WADIM K , FABIAN M , FEDERICO T, et al. SSD-6D: making RGB-based 3D detection and 6D pose estimation great again [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 1521–1529.
3 DI Y, MANHARDT F, WANG G, et al. SO-POSE: exploiting self-occlusion for direct 6D pose estimation [C]// International Conference on Computer Vision. Montreal: IEEE, 2021: 12396-12405.
4 WANG G, MANHARDT F, TOMBARI F, et al. GDR-NET: geometry-guided direct regression network for monocular 6D object pose estimation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Montreal: IEEE, 2021: 1661–16621.
5 刘城. 手腕型操作中操作者手部的主动视觉追踪方法研究[D]. 武汉: 武汉理工大学, 2020.
LIU Cheng. Research on active vision tracking method of operator’s hand in wrist-type operation[D]. Wuhan: Wuhan University of Technology, 2020.
6 戈振鹏. 基于主动视觉和强化学习的机械臂装配研究[D]. 成都: 电子科技大学, 2022.
GE Zhen-peng. Research on robotic assembly based on active vision and reinforcement learning[D]. Chengdu: University of Electronic Technology, 2022.
7 ZHOU Y, BARNES C, LU J W, et al. On the continuity of rotation representations in neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3: 5745-5753.
8 KUNDU A, LI Y, REHG J M. 3D-RCNN: instance-level 3D object reconstruction via render and compare [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 3: 3559–3568.
9 LI Z G, WANG G, JI X Y. CDPN: coordinates-based disentangled pose network for real time rgb-based 6-DoF object pose estimation [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019. 1: 7678–7687.
10 VOLODYMYR M, NICOLAS H, ALEX G, et al. Recurrent models of visual attention[C]// Conference and Workshop on Neural Information Processing Systems. Montreal: [s.n.], 2014: 136–145.
11 JAMIE S, BEN G, CHRISTOPHER Z, et al. Scene coordinate regression forests for camera relocalization in RGB-D images[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 2930-2937.
12 ADAM P, SAM G, FRANCISCO M, et al. Pytorch: an imperative style, high-performance deep learning library [C]// Advances in Neural Information Processing Systems. [s.l.]: Manning publications, 2019, 5: 8026–8037.
13 LIU L Y, JIANG H M, HE P C, et al. On the variance of the adaptive learning rate and beyond [C]// International Conference on Learning Representations. Vancouver: [s.n.], 2020.
14 WAIL M, NICOLAS P, NORBERT K. Multi-view object recognition using view-point invariant shape relations and appearance information [C]// IEEE International Conference on Robotics and Automation. Guangzhou: [s.n.], 2013: 4230–4237.
15 LI Y, WANG G, JI X Y, et al. Deepim: deep iterative matching for 6D pose estimation[C]// Proceedings of the European Conference on Computer Vision. Munich: [s.n.], 2018: 683-698.
16 HU Y L, PASCAL F, WANG W, et al. Single-stage 6D object pose estimation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Detroit: IEEE, 2020: 2930-2939.
17 VINCENT L, FRANCESC M N, PASCAL F. EPnP: an accurate O(N) solution to the PnP problem[J]. International Journal of Computer Vision, 2009, (81): 155–166.
18 吕成志. 面向复杂场景的目标六自由度姿态估计关键技术研究[D]. 广州: 华南理工大学, 2020.
LV Cheng-zhi. Research on key technologies of object six degree of freedom pose estimation for complex scenes[D]. Guangzhou: South China University of Technology, 2020.
19 YU X, TANNER S, VENKATRAMAN N, et al. POSECNN: a convolutional neural network for 6D object pose estimation in cluttered scenes [C]// Robotics: Science and Systems. Pittsburgh: [s.n.], 2018.
20 YANN L, JUSTIN C, MATHIEU A, et al. Cosypose: consistent multi-view multi-object 6D pose estimation [C]// European Conference on Computer Vision. Glasgow: [s.n.], 2020: 574–591.
21 刘建伟, 宋志妍 循环神经网络研究综述[J]. 控制与决策, 2022, 37 (11): 2753- 2768
LIU Jian-wei, SONG Zhi-yan Research on recurrent neural network[J]. Journal of Computer Applications, 2022, 37 (11): 2753- 2768
22 NIALL M, JESUS M D R, PAUL M. Recurrent convolutional network for video-based person re-identification [C]// IEEE International Conference on Computer Vision. Las Vegas: IEEE, 2016: 1325-1334.
23 胡代弟, 李锐君. 基于主动视觉的机械表面疲劳损伤裂纹检测[J]. 制造业自动化, 2022, 44(5): 170-174.
HU Dai-di, LI Rui-jun. Mechanical surface fatigue damage crack detection based on active vision[J]. Manufacturing Automation. 2022, 44(5) : 170-174.
24 罗宇. 基于深度学习的目标姿态估计与机械臂抓取研究[D]. 广州: 广东工业大学, 2020.
LUO Yu. Research on target attitude estimation and manipulator grab based on deep learning[D]. Guangzhou: Guangdong University of Technology, 2020.
[1] 熊帆,陈田,卞佰成,刘军. 基于卷积循环神经网络的芯片表面字符识别[J]. 浙江大学学报(工学版), 2023, 57(5): 948-956.
[2] 孙炜,刘恒,陶建峰,孙浩,刘成良. 基于IndRNN-1DLCNN的负载口独立控制阀控缸系统故障诊断[J]. 浙江大学学报(工学版), 2023, 57(10): 2028-2041.
[3] 董红召,王桢,张楠,佘翊妮,林盈盈. 电动公交车电池荷电状态的Seq2Seq预测方法[J]. 浙江大学学报(工学版), 2023, 57(10): 2051-2059.
[4] 张楠,董红召,佘翊妮. 公交专用道条件下公交车辆轨迹的Seq2Seq预测[J]. 浙江大学学报(工学版), 2021, 55(8): 1482-1489.
[5] 蔺志伟,李奇敏,汪显宇. 基于改进Census变换的单目视觉里程计[J]. 浙江大学学报(工学版), 2021, 55(8): 1500-1509.
[6] 赵丽科, 郑顺义, 王晓南, 黄霞. 单目序列的刚体目标位姿测量[J]. 浙江大学学报(工学版), 2018, 52(12): 2372-2381.
[7] 郑驰, 项志宇, 刘济林. 融合光流与特征点匹配的单目视觉里程计[J]. J4, 2014, 48(2): 279-284.