Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (2): 313-321    DOI: 10.3785/j.issn.1008-973X.2026.02.009
计算机技术与控制工程     
动态场景下融合YOLOv11n目标检测的优化ORB-SLAM3算法
谢章郁1(),杨杰2,*(),欧阳嗣源1,曾阳剑1
1. 江西理工大学 电气工程与自动化学院,江西 赣州 341000
2. 上海电机学院 电气学院,上海 201306
Optimized ORB-SLAM3 algorithm incorporating YOLOv11n object detection for dynamic scenes
Zhangyu XIE1(),Jie YANG2,*(),Siyuan OUYANG1,Yangjian ZENG1
1. School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China
2. School of Electrical Engineering, Shanghai Dianji University, Shanghai 201306, China
 全文: PDF(6821 KB)   HTML
摘要:

针对传统视觉同步定位与建图(SLAM)技术在动态环境中定位精度低、鲁棒性差的问题,提出融合 YOLOv11n 目标检测的优化 ORB-SLAM3 算法. 在传统系统中融入基于开放式神经网络交换格式(ONNX) 推理的 YOLOv11n 网络,增加语义信息;利用静态区域特征点生成初始位姿,投影地图点至动态区域;结合双阶段位姿优化算法,在动态区域内筛选静态特征点及剔除动态特征点,提升位姿估计精度与增加优质特征点数量. 在原有3个线程外新增线程,利用关键帧区域像素点构建稠密地图,为后续的人机交互场景提供丰富的环境感知与理解. 在公开数据集TUM上的实验结果表明,在位姿估计精度方面,所提算法与基准模型相比最高提升98.3%. 所提算法能够有效消除动态物体对位姿估计的影响,满足稠密地图的构建需求.

关键词: ORB-SLAM3开放式神经网络交换格式(ONNX)YOLOv11n双阶段位姿优化算法稠密地图重建    
Abstract:

To address the issues of low positioning accuracy and poor robustness in traditional visual simultaneous localization and mapping (SLAM) techniques within dynamic environments, an optimized ORB-SLAM3 algorithm incorporating YOLOv11n object detection was proposed. In the traditional system, an open neural network exchange (ONNX)-based YOLOv11n inference network was integrated to augment semantic information. Initial poses were generated using feature points from static regions, with map points then projected onto dynamic regions. A two-stage pose optimisation algorithm was integrated to filter static feature points and eliminate dynamic ones in dynamic regions, whereby pose estimation accuracy was improved and the number of high-quality feature points was increased. An additional thread was introduced beyond the original three, utilising pixels from keyframe regions to construct dense maps, providing rich environmental perception and understanding for subsequent human-computer interaction scenarios. Experimental results on the publicly available TUM dataset demonstrate that the proposed algorithm improves pose-estimation accuracy up to 98.3% relative to the baseline models. The proposed algorithm effectively mitigates the impact of dynamic objects on pose estimation while satisfying the requirements for dense map construction.

Key words: ORB-SLAM3    open neural network exchange (ONNX)    YOLOv11n    two-stage pose optimisation algorithm    dense map reconstruction
收稿日期: 2025-07-14 出版日期: 2026-02-03
CLC:  TP 751  
基金资助: 国家重点研发计划项目(2024YFB4303203-5).
通讯作者: 杨杰     E-mail: 2630777181@qq.com;yangjie@jxust.edu.cn
作者简介: 谢章郁(2002—),男,硕士生,从事视觉SLAM研究. orcid.org/0009-0004-5569-0501. E-mail:2630777181@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
谢章郁
杨杰
欧阳嗣源
曾阳剑

引用本文:

谢章郁,杨杰,欧阳嗣源,曾阳剑. 动态场景下融合YOLOv11n目标检测的优化ORB-SLAM3算法[J]. 浙江大学学报(工学版), 2026, 60(2): 313-321.

Zhangyu XIE,Jie YANG,Siyuan OUYANG,Yangjian ZENG. Optimized ORB-SLAM3 algorithm incorporating YOLOv11n object detection for dynamic scenes. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 313-321.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.02.009        https://www.zjujournals.com/eng/CN/Y2026/V60/I2/313

图 1  融合YOLOv11n目标检测的优化ORB-SLAM3算法系统框架
图 2  双阶段位姿优化算法触发流程
图 3  恒速跟踪模型的位姿估计更新状态过程
图 4  恒速跟踪模型中的双阶段位姿优化算法流程
图 5  参考关键帧跟踪过程
图 6  跟踪参考关键帧模型中的双阶段位姿优化算法流程
图 7  重定位候选关键帧的搜索过程
图 8  重定位模型中的双阶段位姿优化算法流程
图 9  稠密地图重建流程
图 10  不同ORB-SLAM算法的动态剔除效果
序列ORB-SLAM3仅加入目标检测算法本研究算法
RMSEAσRMSEAσRMSEAσ
xyz0.89760.40220.01940.00990.01520.0070
rpy0.61330.20220.03520.02030.02940.0149
halfphere0.37430.21790.04590.02550.02140.0112
static0.02050.01380.00640.00290.00580.0027
表 1  不同ORB-SLAM算法在TUM数据集中的绝对轨迹误差对比
序列ORB-SLAM3仅加入目标检测算法本研究算法
RMSERσRMSERσRMSERσ
xyz0.681 40.386 10.034 80.015 30.020 00.006 8
rpy0.610 90.251 30.045 50.021 20.035 60.012 5
halfphere0.387 50.239 90.061 40.015 50.026 20.010 9
static0.079 10.049 30.015 10.002 00.011 60.001 0
表 2  不同ORB-SLAM算法在TUM数据集中的平移相对位姿误差对比
图 11  TUM 数据集不同序列中2种ORB-SLAM算法的绝对轨迹误差可视化对比
算法RMSEAσ
xyzrpyhalfpherestaticxyzrpyhalfpherestatic
DS-SLAM[18]0.02470.44420.03030.00810.01610.23500.01590.0036
SG-SLAM[19]0.01520.03240.02680.00730.00750.01870.01340.0034
OVD-SLAM[20]0.01350.03490.02290.00680.00680.02110.01110.0030
CFP-SLAM[11]0.01410.03680.02370.00660.00720.02300.01140.0030
本研究0.01520.02940.02140.00580.00700.01490.01120.0027
表 3  不同SLAM算法在TUM数据集中的绝对轨迹误差对比
图 12  不同ORB-SLAM算法生成的地图对比
图 13  数据收集过程
图 14  对比不同ORB-SLAM算法在真实场景的检测效果
图 15  所提算法在真实场景中构建的稠密地图
序列tY11ntfttta
xyz35.516.226.177.8
rpy37.615.323.476.3
halfphere35.522.633.191.2
static35.721.714.972.3
表 4  所提算法各模块在TUM数据集不同序列中的运行时间
1 CADENA C, CARLONE L, CARRILLO H, et al Past, present, and future of simultaneous localization and mapping: toward the robust-perception age[J]. IEEE Transactions on Robotics, 2016, 32 (6): 1309- 1332
doi: 10.1109/TRO.2016.2624754
2 王朋, 郝伟龙, 倪翠, 等 视觉SLAM方法综述[J]. 北京航空航天大学学报, 2024, 50 (2): 359- 367
WANG Peng, HAO Weilong, NI Cui, et al An overview of visual SLAM methods[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (2): 359- 367
doi: 10.13700/j.bh.1001-5965.2022.0376
3 QIN T, LI P, SHEN S VINS-mono: a robust and versatile monocular visual-inertial state estimator[J]. IEEE Transactions on Robotics, 2018, 34 (4): 1004- 1020
doi: 10.1109/TRO.2018.2853729
4 ENGEL J, KOLTUN V, CREMERS D Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (3): 611- 625
doi: 10.1109/TPAMI.2017.2658577
5 KLEIN G, MURRAY D. Parallel tracking and mapping for small AR workspaces [C]// Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. Nara: IEEE, 2008: 225–234.
6 CARUSO D, ENGEL J, CREMERS D. Large-scale direct SLAM for omnidirectional cameras [C]// Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg: IEEE, 2015: 141–148.
7 MUR-ARTAL R, MONTIEL J M M, TARDÓS J D ORB-SLAM: a versatile and accurate monocular SLAM system[J]. IEEE Transactions on Robotics, 2015, 31 (5): 1147- 1163
doi: 10.1109/TRO.2015.2463671
8 FORSTER C, PIZZOLI M, SCARAMUZZA D. SVO: fast semi-direct monocular visual odometry [C]// Proceedings of the IEEE International Conference on Robotics and Automation. Hong Kong: IEEE, 2014: 15–22.
9 黄泽霞, 邵春莉 深度学习下的视觉SLAM综述[J]. 机器人, 2023, 45 (6): 756- 768
HUANG Zexia, SHAO Chunli Survey of visual SLAM based on deep learning[J]. Robot, 2023, 45 (6): 756- 768
doi: 10.13973/j.cnki.robot.220426
10 BESCOS B, FÁCIL J M, CIVERA J, et al DynaSLAM: tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3 (4): 4076- 4083
doi: 10.1109/LRA.2018.2860039
11 HU X, ZHANG Y, CAO Z, et al. CFP-SLAM: a real-time visual SLAM based on coarse-to-fine probability in dynamic environments [C]// Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto: IEEE, 2022: 4399–4406.
12 CHANG J, DONG N, LI D A real-time dynamic object segmentation framework for SLAM system in dynamic scenes[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 2513709
13 ZHANG J, HENEIN M, MAHONY R, et al. VDO-SLAM: a visual dynamic object-aware SLAM system [EB/OL]. (2021–12–14)[2025–07–03]. https://arxiv.org/pdf/2005.11052.
14 张玮奇, 王嘉, 张琳, 等 SUI-SLAM: 一种面向室内动态环境的融合语义和不确定度的视觉SLAM方法[J]. 机器人, 2024, 46 (6): 732- 742
ZHANG Weiqi, WANG Jia, ZHANG Lin, et al SUI-SLAM: a semantics and uncertainty incorporated visual SLAM algorithm towards dynamic indoor environments[J]. Robot, 2024, 46 (6): 732- 742
doi: 10.13973/j.cnki.robot.230195
15 翟伟光, 王峰, 马星宇, 等 YSG-SLAM: 动态场景下基于YOLACT的实时语义RGB-D SLAM系统[J]. 兵工学报, 2025, 46 (6): 167- 179
ZHAI Weiguang, WANG Feng, MA Xingyu, et al YSG-SLAM: a real-time semantic RGB-D SLAM based on YOLACT in dynamic scene[J]. Acta Armamentarii, 2025, 46 (6): 167- 179
doi: 10.12382/bgxb.2024.0443
16 刘钰嵩, 何丽, 袁亮, 等 动态场景下基于光流的语义RGBD-SLAM算法[J]. 仪器仪表学报, 2022, 43 (12): 139- 148
LIU Yusong, HE Li, YUAN Liang, et al Semantic RGBD-SLAM in dynamic scene based on optical flow[J]. Chinese Journal of Scientific Instrument, 2022, 43 (12): 139- 148
doi: 10.19650/j.cnki.cjsi.J2209856
17 CAMPOS C, ELVIRA R, RODRÍGUEZ J J G, et al ORB-SLAM3: an accurate open-source library for visual, visual–inertial, and multimap SLAM[J]. IEEE Transactions on Robotics, 2021, 37 (6): 1874- 1890
doi: 10.1109/TRO.2021.3075644
18 YU C, LIU Z, LIU X J, et al. DS-SLAM: a semantic visual SLAM towards dynamic environments [C]// Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid: IEEE, 2019: 1168–1174.
19 CHENG S, SUN C, ZHANG S, et al SG-SLAM: a real-time RGB-D visual SLAM toward dynamic scenes with semantic and geometric information[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72: 7501012
[1] 张振利,胡新凯,李凡,冯志成,陈智超. 基于CNN和Efficient Transformer的多尺度遥感图像语义分割算法[J]. 浙江大学学报(工学版), 2025, 59(4): 778-786.
[2] 冯志成,杨杰,陈智超. 基于轻量级Transformer的城市路网提取方法[J]. 浙江大学学报(工学版), 2024, 58(1): 40-49.