Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (2): 313-321    DOI: 10.3785/j.issn.1008-973X.2026.02.009
    
Optimized ORB-SLAM3 algorithm incorporating YOLOv11n object detection for dynamic scenes
Zhangyu XIE1(),Jie YANG2,*(),Siyuan OUYANG1,Yangjian ZENG1
1. School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China
2. School of Electrical Engineering, Shanghai Dianji University, Shanghai 201306, China
Download: HTML     PDF(6821KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

To address the issues of low positioning accuracy and poor robustness in traditional visual simultaneous localization and mapping (SLAM) techniques within dynamic environments, an optimized ORB-SLAM3 algorithm incorporating YOLOv11n object detection was proposed. In the traditional system, an open neural network exchange (ONNX)-based YOLOv11n inference network was integrated to augment semantic information. Initial poses were generated using feature points from static regions, with map points then projected onto dynamic regions. A two-stage pose optimisation algorithm was integrated to filter static feature points and eliminate dynamic ones in dynamic regions, whereby pose estimation accuracy was improved and the number of high-quality feature points was increased. An additional thread was introduced beyond the original three, utilising pixels from keyframe regions to construct dense maps, providing rich environmental perception and understanding for subsequent human-computer interaction scenarios. Experimental results on the publicly available TUM dataset demonstrate that the proposed algorithm improves pose-estimation accuracy up to 98.3% relative to the baseline models. The proposed algorithm effectively mitigates the impact of dynamic objects on pose estimation while satisfying the requirements for dense map construction.



Key wordsORB-SLAM3      open neural network exchange (ONNX)      YOLOv11n      two-stage pose optimisation algorithm      dense map reconstruction     
Received: 14 July 2025      Published: 03 February 2026
CLC:  TP 751  
  U 212  
Fund:  国家重点研发计划项目(2024YFB4303203-5).
Corresponding Authors: Jie YANG     E-mail: 2630777181@qq.com;yangjie@jxust.edu.cn
Cite this article:

Zhangyu XIE,Jie YANG,Siyuan OUYANG,Yangjian ZENG. Optimized ORB-SLAM3 algorithm incorporating YOLOv11n object detection for dynamic scenes. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 313-321.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.02.009     OR     https://www.zjujournals.com/eng/Y2026/V60/I2/313


动态场景下融合YOLOv11n目标检测的优化ORB-SLAM3算法

针对传统视觉同步定位与建图(SLAM)技术在动态环境中定位精度低、鲁棒性差的问题,提出融合 YOLOv11n 目标检测的优化 ORB-SLAM3 算法. 在传统系统中融入基于开放式神经网络交换格式(ONNX) 推理的 YOLOv11n 网络,增加语义信息;利用静态区域特征点生成初始位姿,投影地图点至动态区域;结合双阶段位姿优化算法,在动态区域内筛选静态特征点及剔除动态特征点,提升位姿估计精度与增加优质特征点数量. 在原有3个线程外新增线程,利用关键帧区域像素点构建稠密地图,为后续的人机交互场景提供丰富的环境感知与理解. 在公开数据集TUM上的实验结果表明,在位姿估计精度方面,所提算法与基准模型相比最高提升98.3%. 所提算法能够有效消除动态物体对位姿估计的影响,满足稠密地图的构建需求.


关键词: ORB-SLAM3,  开放式神经网络交换格式(ONNX),  YOLOv11n,  双阶段位姿优化算法,  稠密地图重建 
Fig.1 System framework of optimized ORB-SLAM3 algorithm incorporating YOLOv11n object detection
Fig.2 Triggering process of two-stage pose optimisation algorithm
Fig.3 State update process of pose estimation in constant-speed tracking model
Fig.4 Workflow of two-stage pose optimisation algorithm in constant-speed tracking model
Fig.5 Tracking process of reference keyframe
Fig.6 Workflow of two-stage pose optimisation algorithm in tracking reference keyframe model
Fig.7 Search process for relocating candidate keyframe
Fig.8 Workflow of two-stage pose optimisation algorithm in relocation model
Fig.9 Workflow of dense map reconstruction
Fig.10 Dynamic elimination effects of different ORB-SLAM algorithms
序列ORB-SLAM3仅加入目标检测算法本研究算法
RMSEAσRMSEAσRMSEAσ
xyz0.89760.40220.01940.00990.01520.0070
rpy0.61330.20220.03520.02030.02940.0149
halfphere0.37430.21790.04590.02550.02140.0112
static0.02050.01380.00640.00290.00580.0027
Tab.1 Absolute trajectory error comparison of different ORB-SLAM algorithms on TUM dataset m
序列ORB-SLAM3仅加入目标检测算法本研究算法
RMSERσRMSERσRMSERσ
xyz0.681 40.386 10.034 80.015 30.020 00.006 8
rpy0.610 90.251 30.045 50.021 20.035 60.012 5
halfphere0.387 50.239 90.061 40.015 50.026 20.010 9
static0.079 10.049 30.015 10.002 00.011 60.001 0
Tab.2 Comparison of translational relative pose error of different ORB-SLAM algorithms on TUM dataset m
Fig.11 Visualized comparison of absolute trajectory error between two ORB-SLAM algorithms across TUM-dataset sequences
算法RMSEAσ
xyzrpyhalfpherestaticxyzrpyhalfpherestatic
DS-SLAM[18]0.02470.44420.03030.00810.01610.23500.01590.0036
SG-SLAM[19]0.01520.03240.02680.00730.00750.01870.01340.0034
OVD-SLAM[20]0.01350.03490.02290.00680.00680.02110.01110.0030
CFP-SLAM[11]0.01410.03680.02370.00660.00720.02300.01140.0030
本研究0.01520.02940.02140.00580.00700.01490.01120.0027
Tab.3 Absolute trajectory error comparison of different SLAM algorithms on TUM dataset m
Fig.12 Map comparison across different ORB-SLAM algorithms
Fig.13 Data collection process
Fig.14 Detection performance of different ORB-SLAM algorithms in real-world scenes
Fig.15 Dense map constructed by proposed algorithm in real-world scenes
序列tY11ntfttta
xyz35.516.226.177.8
rpy37.615.323.476.3
halfphere35.522.633.191.2
static35.721.714.972.3
Tab.4 Per-module runtime of proposed algorithm across TUM-dataset sequences ms
[1]   CADENA C, CARLONE L, CARRILLO H, et al Past, present, and future of simultaneous localization and mapping: toward the robust-perception age[J]. IEEE Transactions on Robotics, 2016, 32 (6): 1309- 1332
doi: 10.1109/TRO.2016.2624754
[2]   王朋, 郝伟龙, 倪翠, 等 视觉SLAM方法综述[J]. 北京航空航天大学学报, 2024, 50 (2): 359- 367
WANG Peng, HAO Weilong, NI Cui, et al An overview of visual SLAM methods[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (2): 359- 367
doi: 10.13700/j.bh.1001-5965.2022.0376
[3]   QIN T, LI P, SHEN S VINS-mono: a robust and versatile monocular visual-inertial state estimator[J]. IEEE Transactions on Robotics, 2018, 34 (4): 1004- 1020
doi: 10.1109/TRO.2018.2853729
[4]   ENGEL J, KOLTUN V, CREMERS D Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (3): 611- 625
doi: 10.1109/TPAMI.2017.2658577
[5]   KLEIN G, MURRAY D. Parallel tracking and mapping for small AR workspaces [C]// Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. Nara: IEEE, 2008: 225–234.
[6]   CARUSO D, ENGEL J, CREMERS D. Large-scale direct SLAM for omnidirectional cameras [C]// Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg: IEEE, 2015: 141–148.
[7]   MUR-ARTAL R, MONTIEL J M M, TARDÓS J D ORB-SLAM: a versatile and accurate monocular SLAM system[J]. IEEE Transactions on Robotics, 2015, 31 (5): 1147- 1163
doi: 10.1109/TRO.2015.2463671
[8]   FORSTER C, PIZZOLI M, SCARAMUZZA D. SVO: fast semi-direct monocular visual odometry [C]// Proceedings of the IEEE International Conference on Robotics and Automation. Hong Kong: IEEE, 2014: 15–22.
[9]   黄泽霞, 邵春莉 深度学习下的视觉SLAM综述[J]. 机器人, 2023, 45 (6): 756- 768
HUANG Zexia, SHAO Chunli Survey of visual SLAM based on deep learning[J]. Robot, 2023, 45 (6): 756- 768
doi: 10.13973/j.cnki.robot.220426
[10]   BESCOS B, FÁCIL J M, CIVERA J, et al DynaSLAM: tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3 (4): 4076- 4083
doi: 10.1109/LRA.2018.2860039
[11]   HU X, ZHANG Y, CAO Z, et al. CFP-SLAM: a real-time visual SLAM based on coarse-to-fine probability in dynamic environments [C]// Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto: IEEE, 2022: 4399–4406.
[12]   CHANG J, DONG N, LI D A real-time dynamic object segmentation framework for SLAM system in dynamic scenes[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 2513709
[13]   ZHANG J, HENEIN M, MAHONY R, et al. VDO-SLAM: a visual dynamic object-aware SLAM system [EB/OL]. (2021–12–14)[2025–07–03]. https://arxiv.org/pdf/2005.11052.
[14]   张玮奇, 王嘉, 张琳, 等 SUI-SLAM: 一种面向室内动态环境的融合语义和不确定度的视觉SLAM方法[J]. 机器人, 2024, 46 (6): 732- 742
ZHANG Weiqi, WANG Jia, ZHANG Lin, et al SUI-SLAM: a semantics and uncertainty incorporated visual SLAM algorithm towards dynamic indoor environments[J]. Robot, 2024, 46 (6): 732- 742
doi: 10.13973/j.cnki.robot.230195
[15]   翟伟光, 王峰, 马星宇, 等 YSG-SLAM: 动态场景下基于YOLACT的实时语义RGB-D SLAM系统[J]. 兵工学报, 2025, 46 (6): 167- 179
ZHAI Weiguang, WANG Feng, MA Xingyu, et al YSG-SLAM: a real-time semantic RGB-D SLAM based on YOLACT in dynamic scene[J]. Acta Armamentarii, 2025, 46 (6): 167- 179
doi: 10.12382/bgxb.2024.0443
[16]   刘钰嵩, 何丽, 袁亮, 等 动态场景下基于光流的语义RGBD-SLAM算法[J]. 仪器仪表学报, 2022, 43 (12): 139- 148
LIU Yusong, HE Li, YUAN Liang, et al Semantic RGBD-SLAM in dynamic scene based on optical flow[J]. Chinese Journal of Scientific Instrument, 2022, 43 (12): 139- 148
doi: 10.19650/j.cnki.cjsi.J2209856
[17]   CAMPOS C, ELVIRA R, RODRÍGUEZ J J G, et al ORB-SLAM3: an accurate open-source library for visual, visual–inertial, and multimap SLAM[J]. IEEE Transactions on Robotics, 2021, 37 (6): 1874- 1890
doi: 10.1109/TRO.2021.3075644
[18]   YU C, LIU Z, LIU X J, et al. DS-SLAM: a semantic visual SLAM towards dynamic environments [C]// Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid: IEEE, 2019: 1168–1174.
[19]   CHENG S, SUN C, ZHANG S, et al SG-SLAM: a real-time RGB-D visual SLAM toward dynamic scenes with semantic and geometric information[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72: 7501012
[1] Zhenli ZHANG,Xinkai HU,Fan LI,Zhicheng FENG,Zhichao CHEN. Semantic segmentation algorithm for multiscale remote sensing images based on CNN and Efficient Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 778-786.
[2] Zhicheng FENG,Jie YANG,Zhichao CHEN. Urban road network extraction method based on lightweight Transformer[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 40-49.