Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (11): 2277-2284    DOI: 10.3785/j.issn.1008-973X.2025.11.006
机械工程、能源工程     
基于3D关键点的双目视觉物体6D位姿估计
宁锴旭(),陆晴,杨恒*(),王韶涵
太原科技大学 机械工程学院,山西 太原 030024
6D pose estimation of binocular vision object based on 3D key point
Kaixu NING(),Qing LU,Heng YANG*(),Shaohan WANG
School of Mechanical Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China
 全文: PDF(1301 KB)   HTML
摘要:

针对传统位姿估计方法中依赖CAD模型的问题,提出基于多视图几何的双目数据集制作方法及基于3D关键点的物体6D位姿估计网络StereoNet. 通过3D关键点估计网络获取物体的3D关键点,在网络中引入视差注意模块,提高关键点预测的精度. 采用运动恢复结构(SfM)方法重建物体的稀疏点云模型,将查询图像的3D点与SfM模型中的3D点输入图注意力网络(GATs)中进行匹配,通过RANSAC和PnP算法计算得到物体的6D位姿. 实验结果表明,当对3D关键点估计时,StereoNet的MAE评价指标较KeypointNet、KeyPose高1.2~1.6倍. 在6D位姿估计方面,StereoNet的5 cm 5°和3 cm 3°评价指标均优于HLoc、OnePose、Gen6D,平均精确度达到82.1%,证明该网络具有良好的泛化性和准确性.

关键词: 6D位姿数据集制作双目视觉3D关键点匹配PnP算法    
Abstract:

A binocular dataset fabrication method based on multi-view geometry and a 3D key point-based object 6D pose estimation network, StereoNet, were proposed to address the reliance on CAD models in traditional pose estimation methods. The 3D key points of the object were obtained through a 3D key point estimation network, and a parallax attention module was introduced into the network to improve the accuracy of key point prediction. The structure-from-motion (SfM) method was employed to reconstruct a sparse point cloud model of the object. The 3D points from the query image and those from the SfM model were fed into a graph attention network (GATs) for matching. The 6D pose of the object was computed by using the RANSAC and PnP algorithms. The experimental results showed that the MAE metric of StereoNet was 1.2–1.6 times higher than that of KeypointNet and KeyPose in 3D key point estimation. StereoNet outperformed HLoc, OnePose, and Gen6D in the 5 cm 5° and 3 cm 3° evaluation metrics in terms of 6D pose estimation, achieving an average accuracy of 82.1%. The network has strong generalization ability and accuracy.

Key words: 6D pose    dataset creation    binocular vision    3D key point matching    perspective-n-point (PnP) algorithm
收稿日期: 2024-11-04 出版日期: 2025-10-30
:  TP 183  
通讯作者: 杨恒     E-mail: 2654223903@qq.com;93328173@qq.com
作者简介: 宁锴旭(1999—),男,硕士生,从事物体6D位姿估计研究. orcid.org/0009-0003-0633-2874.E-mail:2654223903@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
宁锴旭
陆晴
杨恒
王韶涵

引用本文:

宁锴旭,陆晴,杨恒,王韶涵. 基于3D关键点的双目视觉物体6D位姿估计[J]. 浙江大学学报(工学版), 2025, 59(11): 2277-2284.

Kaixu NING,Qing LU,Heng YANG,Shaohan WANG. 6D pose estimation of binocular vision object based on 3D key point. Journal of ZheJiang University (Engineering Science), 2025, 59(11): 2277-2284.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.11.006        https://www.zjujournals.com/eng/CN/Y2025/V59/I11/2277

图 1  数据集制作的流程图
图 2  双目图像自动采集平台
图 3  图像标注结果
图 4  部分物体的CAD模型
图 5  StereoNet网络的整体架构
图 6  部分物体的稀疏点云模型
图 7  3D关键点估计网络
图 8  GATs网络的结构框架
方法MAE/mm
盒子瓶子杯子
KeypointNet6.46.010.5
KeyPose6.65.89.9
StereoNet5.04.76.1
表 1  类别级3D关键点的估计结果
方法MAE/mm
鼠标小黄鸭
KeypointNet42.855.0
KeyPose38.249.1
StereoNet18.213.6
表 2  未见过物体的3D关键点估计结果
方法瓶子马克杯杯子鼠标
3 cm 3°5 cm 5°3 cm 3°5 cm 5°3 cm 3°5 cm 5°3 cm 3°5 cm 5°
HLoc0.7030.8130.7930.8310.7390.8370.7290.832
OnePose0.7330.8360.8060.8280.7290.8320.7110.819
Gen6D$ ^\dagger $0.5720.6130.5750.6080.4680.5150.5080.613
Gen6D0.5910.6330.5980.6310.5020.5830.5660.631
StereoNet0.7730.8650.8130.8440.7910.8540.7870.845
表 3  位姿估计精度的对比结果
图 9  不同重建误差对位姿估计精度的影响
图 10  不同物距对位姿估计精度的影响
有纹理无纹理
物体3 cm 3°5 cm 5°3 cm 3°5 cm 5°
杯子0.7810.8260.422
盒子0.7980.8480.459
马克杯0.8010.8320.403
表 4  有、无纹理物体的位姿估计精度对比实验
1 YANN L, JUSTIN C, MATHIEU A, et al. Cosypose: consistent multi-view multi-object 6D pose estimation [C]// European Conference on Computer Vision. Glasgow: [s. n. ], 2020: 574–591.
2 PENG S D, LIU Y, HUANG Q X, et al. Pvnet: pixel-wise voting network for 6dof pose estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4561-4570.
3 HE Y S, SUN W, HUANG H B, et al. Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11632-11641.
4 HE Y S, HUANG H B, FAN H Q, et al. Ffb6d: a full flow bidirectional fusion network for 6d pose estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3003-3013.
5 LI Z G, WANG G, JI X Y. CDPN: coordinates-based disentangled pose network for real time rgb-based 6-DoF object pose estimation [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 7678-7687.
6 TREMBLAY J, TO T, SUNDARALINGAM B, et al. Deep object pose estimation for semantic robotic grasping of household objects [C]//2nd Conference on Robot Learning. Zurich: PMLR, 2018: 306-316.
7 GAO G, LAURI M, WANG Y L, et al. 6d object pose regression via supervised learning on point clouds [C]//IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 3643-3649.
8 CHEN W, JIA X, CHANG H J, et al. g2l-net: global to local network for real-time 6d pose estimation with embedding vector features [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 4233-4242.
9 AHMADYAN A, ZHANG L K, ABLAVATSKI A, et al. Objectron: a large scale dataset of object-centric videos in the wild with pose annotations [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 7822-7831.
10 WANG H, SRIDHAR S, HUANG J W, et al. Normalized object coordinate space for category-level 6d object pose and size estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2642-2651.
11 ZHANG R D, DI Y, MANHARDT F, et al. SSP-pose: symmetry-aware shape prior deformation for direct category-level object pose estimation [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto: IEEE, 2022: 7452-7459.
12 HE Y S, WANG Y, FAN H Q, et al. Fs6d: few-shot 6d pose estimation of novel objects [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 6814-6824.
13 WU J, WANG Y, XIONG R. Unseen object pose estimation via registration [C]// IEEE International Conference on Real-time Computing and Robotics. Guangzhou: IEEE, 2021: 974-979.
14 SUN J M, WANG Z H, ZHANG S Y, et al. Onepose: one-shot object pose estimation without cad models [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 6825-6834.
15 CHEN K, JAMES S, SUI C Y, et al. Stereopose: category-level 6d transparent object pose estimation from stereo images via back-view nocs [C]//IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 2855-2861.
16 YIN M H, YAO Z L, CAO Y, et al. Disentangled non-local neural networks [C]// European Conference on Computer Vision. Glasgow: Springer, 2020: 191-207.
17 WANG Y Q, YING X Y, WANG L G, et al. Symmetric parallax attention for stereo image super-resolution [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 766-775.
18 VELIčKOVIć P, CUCURULL G, CASANOVA A, et al. Graph attention networks [C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2021: 850-862.
19 SUWAJANAKORN S, SNAVELY N, TOMPSON J J, et al. Discovery of latent 3d keypoints via end-to-end geometric reasoning [C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc, 2018: 2067-2074.
20 LIU X Y, JONSCHKOWSKI R, ANGELOVA A, et al. Keypose: multi-view 3d labeling and keypoint estimation for transparent objects [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11602-11610.
21 SARLIN P E, CADENA C, SIEGWART R, et al. From coarse to fine: robust hierarchical localization at large scale [C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12716-12725.
[1] 何婧瑶,李鹏飞,汪承志,吕振鸣,牟萍. 基于双目视觉和改进YOLOv8的动态三维重建方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1443-1450.
[2] 魏翠婷,赵唯坚,孙博超,刘芸怡. 基于改进Mask R-CNN与双目视觉的智能配筋检测[J]. 浙江大学学报(工学版), 2024, 58(5): 1009-1019.
[3] 杨恒,李卓,康忠元,田兵,董青. 基于循环神经网络的双目视觉物体6D位姿估计[J]. 浙江大学学报(工学版), 2023, 57(11): 2179-2187.
[4] 马浩然,丁雅斌. 基于双目视觉的激光位移传感器标定方法[J]. 浙江大学学报(工学版), 2021, 55(9): 1634-1642.
[5] 柯显信, 张文朕, 杨阳, 温雷. 仿人机器人多传感器定位系统[J]. 浙江大学学报(工学版), 2018, 52(7): 1247-1252.
[6] 王晨学, 平雪良, 徐超. 解决约束平面偏移问题的机械臂闭环标定[J]. 浙江大学学报(工学版), 2018, 52(11): 2110-2119.
[7] 刘中, 陈伟海, 吴星明, 邹宇华, 王建华. 基于双目视觉的显著性区域检测[J]. J4, 2014, 48(2): 354-359.