基于3D关键点的双目视觉物体6D位姿估计

doi:10.3785/j.issn.1008-973X.2025.11.006

浙江大学学报(工学版)

2025, Vol. 59

Issue (11): 2277-2284 DOI: 10.3785/j.issn.1008-973X.2025.11.006

机械工程、能源工程

基于3D关键点的双目视觉物体6D位姿估计

宁锴旭(

),陆晴,杨恒*(

),王韶涵

太原科技大学机械工程学院，山西太原 030024

6D pose estimation of binocular vision object based on 3D key point

Kaixu NING(

),Qing LU,Heng YANG*(

),Shaohan WANG

School of Mechanical Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China

全文: PDF(1301 KB) HTML

摘要：

针对传统位姿估计方法中依赖CAD模型的问题，提出基于多视图几何的双目数据集制作方法及基于3D关键点的物体6D位姿估计网络StereoNet. 通过3D关键点估计网络获取物体的3D关键点，在网络中引入视差注意模块，提高关键点预测的精度. 采用运动恢复结构（SfM）方法重建物体的稀疏点云模型，将查询图像的3D点与SfM模型中的3D点输入图注意力网络（GATs）中进行匹配，通过RANSAC和PnP算法计算得到物体的6D位姿. 实验结果表明，当对3D关键点估计时，StereoNet的MAE评价指标较KeypointNet、KeyPose高1.2~1.6倍. 在6D位姿估计方面，StereoNet的5 cm 5°和3 cm 3°评价指标均优于HLoc、OnePose、Gen6D，平均精确度达到82.1%，证明该网络具有良好的泛化性和准确性.

关键词： 6D位姿; 数据集制作; 双目视觉; 3D关键点匹配; PnP算法

Abstract:

A binocular dataset fabrication method based on multi-view geometry and a 3D key point-based object 6D pose estimation network, StereoNet, were proposed to address the reliance on CAD models in traditional pose estimation methods. The 3D key points of the object were obtained through a 3D key point estimation network, and a parallax attention module was introduced into the network to improve the accuracy of key point prediction. The structure-from-motion (SfM) method was employed to reconstruct a sparse point cloud model of the object. The 3D points from the query image and those from the SfM model were fed into a graph attention network (GATs) for matching. The 6D pose of the object was computed by using the RANSAC and PnP algorithms. The experimental results showed that the MAE metric of StereoNet was 1.2–1.6 times higher than that of KeypointNet and KeyPose in 3D key point estimation. StereoNet outperformed HLoc, OnePose, and Gen6D in the 5 cm 5° and 3 cm 3° evaluation metrics in terms of 6D pose estimation, achieving an average accuracy of 82.1%. The network has strong generalization ability and accuracy.

Key words: 6D pose dataset creation binocular vision 3D key point matching perspective-n-point (PnP) algorithm

收稿日期: 2024-11-04 出版日期: 2025-10-30

TP 183

通讯作者: 杨恒 E-mail: 2654223903@qq.com;93328173@qq.com

作者简介: 宁锴旭（1999—），男，硕士生，从事物体6D位姿估计研究. orcid.org/0009-0003-0633-2874.E-mail：2654223903@qq.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	宁锴旭
	陆晴
	杨恒
	王韶涵

引用本文:

宁锴旭,陆晴,杨恒,王韶涵. 基于3D关键点的双目视觉物体6D位姿估计[J]. 浙江大学学报(工学版), 2025, 59(11): 2277-2284.

Kaixu NING,Qing LU,Heng YANG,Shaohan WANG. 6D pose estimation of binocular vision object based on 3D key point. Journal of ZheJiang University (Engineering Science), 2025, 59(11): 2277-2284.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.11.006 或 https://www.zjujournals.com/eng/CN/Y2025/V59/I11/2277

图 1 数据集制作的流程图

图 2 双目图像自动采集平台

图 3 图像标注结果

图 4 部分物体的CAD模型

图 5 StereoNet网络的整体架构

图 6 部分物体的稀疏点云模型

图 7 3D关键点估计网络

图 8 GATs网络的结构框架

表 1 类别级3D关键点的估计结果

表 2 未见过物体的3D关键点估计结果

表 3 位姿估计精度的对比结果

图 9 不同重建误差对位姿估计精度的影响

图 10 不同物距对位姿估计精度的影响

表 4 有、无纹理物体的位姿估计精度对比实验

1	YANN L, JUSTIN C, MATHIEU A, et al. Cosypose: consistent multi-view multi-object 6D pose estimation [C]// European Conference on Computer Vision. Glasgow: [s. n. ], 2020: 574–591.
2	PENG S D, LIU Y, HUANG Q X, et al. Pvnet: pixel-wise voting network for 6dof pose estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4561-4570.
3	HE Y S, SUN W, HUANG H B, et al. Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11632-11641.
4	HE Y S, HUANG H B, FAN H Q, et al. Ffb6d: a full flow bidirectional fusion network for 6d pose estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3003-3013.
5	LI Z G, WANG G, JI X Y. CDPN: coordinates-based disentangled pose network for real time rgb-based 6-DoF object pose estimation [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 7678-7687.
6	TREMBLAY J, TO T, SUNDARALINGAM B, et al. Deep object pose estimation for semantic robotic grasping of household objects [C]//2nd Conference on Robot Learning. Zurich: PMLR, 2018: 306-316.
7	GAO G, LAURI M, WANG Y L, et al. 6d object pose regression via supervised learning on point clouds [C]//IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 3643-3649.
8	CHEN W, JIA X, CHANG H J, et al. g2l-net: global to local network for real-time 6d pose estimation with embedding vector features [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 4233-4242.
9	AHMADYAN A, ZHANG L K, ABLAVATSKI A, et al. Objectron: a large scale dataset of object-centric videos in the wild with pose annotations [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 7822-7831.
10	WANG H, SRIDHAR S, HUANG J W, et al. Normalized object coordinate space for category-level 6d object pose and size estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2642-2651.
11	ZHANG R D, DI Y, MANHARDT F, et al. SSP-pose: symmetry-aware shape prior deformation for direct category-level object pose estimation [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto: IEEE, 2022: 7452-7459.
12	HE Y S, WANG Y, FAN H Q, et al. Fs6d: few-shot 6d pose estimation of novel objects [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 6814-6824.
13	WU J, WANG Y, XIONG R. Unseen object pose estimation via registration [C]// IEEE International Conference on Real-time Computing and Robotics. Guangzhou: IEEE, 2021: 974-979.
14	SUN J M, WANG Z H, ZHANG S Y, et al. Onepose: one-shot object pose estimation without cad models [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 6825-6834.
15	CHEN K, JAMES S, SUI C Y, et al. Stereopose: category-level 6d transparent object pose estimation from stereo images via back-view nocs [C]//IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 2855-2861.
16	YIN M H, YAO Z L, CAO Y, et al. Disentangled non-local neural networks [C]// European Conference on Computer Vision. Glasgow: Springer, 2020: 191-207.
17	WANG Y Q, YING X Y, WANG L G, et al. Symmetric parallax attention for stereo image super-resolution [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 766-775.
18	VELIčKOVIć P, CUCURULL G, CASANOVA A, et al. Graph attention networks [C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2021: 850-862.
19	SUWAJANAKORN S, SNAVELY N, TOMPSON J J, et al. Discovery of latent 3d keypoints via end-to-end geometric reasoning [C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc, 2018: 2067-2074.
20	LIU X Y, JONSCHKOWSKI R, ANGELOVA A, et al. Keypose: multi-view 3d labeling and keypoint estimation for transparent objects [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11602-11610.
21	SARLIN P E, CADENA C, SIEGWART R, et al. From coarse to fine: robust hierarchical localization at large scale [C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12716-12725.

[1]	何婧瑶,李鹏飞,汪承志,吕振鸣,牟萍. 基于双目视觉和改进YOLOv8的动态三维重建方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1443-1450.
[2]	魏翠婷,赵唯坚,孙博超,刘芸怡. 基于改进Mask R-CNN与双目视觉的智能配筋检测[J]. 浙江大学学报(工学版), 2024, 58(5): 1009-1019.
[3]	杨恒,李卓,康忠元,田兵,董青. 基于循环神经网络的双目视觉物体6D位姿估计[J]. 浙江大学学报(工学版), 2023, 57(11): 2179-2187.
[4]	马浩然,丁雅斌. 基于双目视觉的激光位移传感器标定方法[J]. 浙江大学学报(工学版), 2021, 55(9): 1634-1642.
[5]	柯显信, 张文朕, 杨阳, 温雷. 仿人机器人多传感器定位系统[J]. 浙江大学学报(工学版), 2018, 52(7): 1247-1252.
[6]	王晨学, 平雪良, 徐超. 解决约束平面偏移问题的机械臂闭环标定[J]. 浙江大学学报(工学版), 2018, 52(11): 2110-2119.
[7]	刘中, 陈伟海, 吴星明, 邹宇华, 王建华. 基于双目视觉的显著性区域检测[J]. J4, 2014, 48(2): 354-359.

Viewed

Full text

Abstract

Cited

Shared

Discussed