Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (8): 1671-1679    DOI: 10.3785/j.issn.1008-973X.2025.08.014
    
TD3 mapless navigation algorithm guided by dynamic window approach
Jiale LIU1(),Yali XUE1,*(),Shan CUI2,Jun HONG2
1. College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
2. Shanghai Electro-Mechanical Engineering Institute, Shanghai 201109, China
Download: HTML     PDF(1548KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

The DWA-LSTM TD3 algorithm was proposed in order to address the challenges of high data demand in deep reinforcement learning (DRL) and insufficient utilization of continuous navigation information. Robot motion was controlled based on the relative position of the target point, the robot’s own velocity, and current LiDAR data, without relying on any prior map. The dynamic window approach (DWA) was employed to guide the twin delayed deep deterministic policy gradient (TD3) algorithm during training, thereby enhancing the quality of collected training data. A long short-term memory (LSTM) neural network was integrated into the policy network in order to improve the agent’s ability to process continuous navigation information. A simulation environment was constructed for training and evaluation, and comparative experiments were conducted with other methods. The experimental results show that the DWA-LSTM TD3 algorithm achieves higher cumulative rewards and improves the success rate of navigation tasks under the same number of training steps. The fluctuation range of navigation orientation angles was reduced, smoother trajectories were produced, and the safety performance of robot motion was enhanced. The algorithm can be used to efficiently accomplish across various scenarios. The algorithm has strong generalization ability.



Key wordsmapless navigation      dynamic window approach      deep reinforcement learning      twin delayed deep deterministic policy gradient algorithm      long short-term memory     
Received: 24 May 2024      Published: 28 July 2025
CLC:  TP 242  
Fund:  国家自然科学基金资助项目(62073164);上海市航天科技创新基金资助项目(SAST2022-013).
Corresponding Authors: Yali XUE     E-mail: liujiale@nuaa.edu.cn;xueyali@nuaa.edu.cn
Cite this article:

Jiale LIU,Yali XUE,Shan CUI,Jun HONG. TD3 mapless navigation algorithm guided by dynamic window approach. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1671-1679.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.08.014     OR     https://www.zjujournals.com/eng/Y2025/V59/I8/1671


动态窗口法引导的TD3无地图导航算法

针对深度强化学习(DRL)算法训练数据需求量大、连续导航信息利用不充分的问题,提出DWA-LSTM TD3算法. 该算法根据目标点相对位置、机器人自身速度和当前激光雷达数据控制机器人运动,过程无需先验地图. 在训练过程中,利用动态窗口法(DWA)引导双延迟确定策略梯度(TD3),提高训练数据的质量. 在策略网络中引入长短期记忆神经网络(LSTM),提升智能体对连续导航信息的处理能力. 搭建仿真环境训练测试,与其他方法进行对比. 实验结果表明,DWA-LSTM TD3在相同的训练步数下能够获得更高的奖励值,提高了导航任务的成功率;导航姿态角的波动范围变化更小,轨迹更平滑,改善机器人的运动安全性能. 利用该算法,能够在不同场景下高效完成导航任务. 该算法具有很强的泛化能力.


关键词: 无地图导航,  动态窗口法,  深度强化学习,  双延迟确定策略梯度算法,  长短期记忆 
Fig.1 Kinematic model of differential drive wheeled robot
Fig.2 Structure of TD3 network
Fig.3 Overall architecture of DWA-guided TD3 algorithm
Fig.4 LSTM network model
Fig.5 Processing model of LSTM data
Fig.6 Actor-Critic network structure
Fig.7 Training environment of navigation experiment
参数数值
折扣因子 $\gamma $0.99
软目标更新率 τ0.005
每回合最大时间步 MaxStep500
学习率 LearningRate0.001
策略噪声 PolicyNoise0.2
经验池大小 ReplayBuffer${10^6}$
Tab.1 Hyperparameter setting of DWA-LSTM TD3 algorithm
Fig.8 Comparison of training average reward curve
方法成功率步数奖励
PPO0.7644.2931.92
DDPG0.8743.8867.33
TD30.9052.2755.71
DWA-TD30.8735.9363.96
LSTM-TD30.9144.7560.07
DWA-LSTM TD30.9136.8970.19
Tab.2 Average reward value of trained model
Fig.9 Comparison of trajectory visualization in scenario 1
方法步数平均速度
DDPG1340.389
PPO1300.479
TD3710.667
DWA-TD3640.708
LSTM-TD3760.596
DWA-LSTM TD3680.673
Tab.3 Time step utilized in scenario 1
方法步数平均速度
DDPG970.731
PPO1950.345
TD3960.617
DWA-TD3630.614
LSTM-TD3790.491
DWA-LSTM TD3650.596
Tab.4 Time step utilized in scenario 2
Fig.10 Comparison of trajectory visualization in scenario 2
Fig.11 Comparison of box plot of angular velocity data in scenario 1
Fig.12 Comparison of box plot of angular velocity data in scenario 2
[1]   ZHANG H, LIN W, CHEN A Path planning for the mobile robot: a review[J]. Symmetry, 2018, 10 (10): 450
doi: 10.3390/sym10100450
[2]   SINGH R, REN J, LIN X A Review of deep reinforcement learning algorithms for mobile robot path planning[J]. Vehicles, 2023, 5 (4): 1423- 1451
doi: 10.3390/vehicles5040078
[3]   ZHU K, ZHANG T Deep reinforcement learning based mobile robot navigation: a review[J]. Tsinghua Science and Technology, 2021, 26 (5): 674- 691
doi: 10.26599/TST.2021.9010012
[4]   刘宇庭, 郭世杰, 唐术锋, 等 改进A*与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报: 工学版, 2024, 58 (2): 360- 369
LIU Yuting, GUO Shijie, TANG Shufeng, et al Improved A* and ROA-DWA fusion for robot path planning[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (2): 360- 369
[5]   YAN K, MA B Mapless navigation based on 2D LIDAR in complex unknown environments[J]. Sensors, 2020, 20 (20): 5802
doi: 10.3390/s20205802
[6]   ALI M, LIU L. GP-Frontier for local mapless navigation [C]//IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 10047-10053.
[7]   康振兴 基于路径规划和深度强化学习的机器人避障导航研究[J]. 计算机应用与软件, 2024, 41 (1): 297- 303
KANG Zhenxing Research on robot obstacle avoidance navigation based on path planning and deep reinforcement learning[J]. Computer Applications and Software, 2024, 41 (1): 297- 303
doi: 10.3969/j.issn.1000-386x.2024.01.043
[8]   朱威, 洪力栋, 施海东, 等. 结合优势结构和最小目标Q值的深度强化学习导航算法[J/OL]. 控制理论与应用, 2024, 41(4): 716-728 [2024-05-10]. http://kns.cnki.net/kcms/detail/44.1240.TP.20230223.1323.020.html.
ZHU Wei, HONG Lidong, SHI Haidong, et al. Deep reinforcement learning navigation algorithm combining advantage structure and minimum target Q value [J/OL]. Control Theory and Applications, 2024, 41(4): 716-728 [2024-05-10]. http://kns.cnki.net/kcms/detail/44.1240.TP.20230223.1323.020.html.
[9]   李昭莹, 欧一鸣, 石若凌 基于深度Q网络的改进RRT路径规划算法[J]. 空天防御, 2021, 4 (3): 17- 23
LI Zhaoying, OU Yiming, SHI Ruoling Improved RRT path planning algorithm based on deep Q-network[J]. Air and Space Defense, 2021, 4 (3): 17- 23
doi: 10.3969/j.issn.2096-4641.2021.03.003
[10]   WANG C, WANG J, SHEN Y, et al Autonomous navigation of UAVs in large-scale complex environments: a deep reinforcement learning approach[J]. IEEE Transactions on Vehicular Technology, 2019, 68 (3): 2124- 2136
doi: 10.1109/TVT.2018.2890773
[11]   CIMURS R, SUH I H, LEE J H Goal-driven autonomous exploration through deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2021, 7 (2): 730- 737
[12]   KONG F, WANG Q, GAO S, et al B-APFDQN: a UAV path planning algorithm based on deep Q-network and artificial potential field[J]. IEEE Access, 2023, 11: 44051- 44064
doi: 10.1109/ACCESS.2023.3273164
[13]   李永迪, 李彩虹, 张耀玉, 等 基于APF-LSTM-DDPG算法的移动机器人局部路径规划[J]. 山东理工大学学报: 自然科学版, 2024, 38 (1): 33- 41
LI Yongdi, LI Caihong, ZHANG Yaoyu, et al Local path planning for mobile robots based on APF-LSTM-DDPG algorithm[J]. Journal of Shandong University of Technology: Natural Science Edition, 2024, 38 (1): 33- 41
[14]   张凤, 顾琦然, 袁帅 好奇心蒸馏双Q网络移动机器人路径规划方法[J]. 计算机工程与应用, 2023, 59 (19): 316- 322
ZHANG Feng, GU Qiran, YUAN Shuai Path planning method for mobile robots using curiosity distilled double Q-network[J]. Computer Engineering and Applications, 2023, 59 (19): 316- 322
doi: 10.3778/j.issn.1002-8331.2208-0422
[15]   ZHANG Q, ZHANG L, MA Q, et al. The LSTM-PER-TD3 algorithm for deep reinforcement learning in continuous control tasks [C]//China Automation Congress. Chongqing: IEEE, 2023: 671-676.
[16]   TAN Y, LIN Y, LIU T, et al. PL-TD3: a dynamic path planning algorithm of mobile robot [C]//IEEE International Conference on Systems, Man, and Cybernetics. Prague: IEEE, 2022: 3040-3045.
[17]   HUANG B, XIE J, YAN J Inspection robot navigation based on improved TD3 algorithm[J]. Sensors, 2024, 24 (8): 2525
doi: 10.3390/s24082525
[18]   XIE L, WANG S, ROSA S, et al. Learning with training wheels: speeding up training with a simple controller for deep reinforcement learning [C]//IEEE International Conference on Robotics and Automation. Brisbane: IEEE, 2018: 6276-6283.
[19]   YU W, PENG J, QIU Q, et al. PathRL: an end-to-end path generation method for collision avoidance via deep reinforcement learning [C]// IEEE International Conference on Robotics and Automation. Yokohama: IEEE, 2024: 9278-9284.
[20]   户高铭, 蔡克卫, 王芳, 等 基于深度强化学习的无地图移动机器人导航[J]. 控制与决策, 2024, 39 (3): 985- 993
HU Gaoming, CAI Kewei, WANG Fang, et al Mapless navigation for mobile robots based on deep reinforcement learning[J]. Control and Decision, 2024, 39 (3): 985- 993
[21]   金毅康. 移动机器人多机探索与路径规划算法研究[D]. 西安: 西安电子科技大学, 2020.
JIN Yikang. Research on multi-robot exploration and path planning algorithms for mobile robots [D]. Xi’an: Xidian University, 2020.
[22]   罗洁, 王中训, 潘康路, 等 基于改进人工势场法的无人车路径规划算法[J]. 电子设计工程, 2022, 30 (17): 90- 94
LUO Jie, WANG Zhongxun, PAN Kanglu, et al Path planning algorithm for unmanned vehicles based on improved artificial potential field method[J]. Electronic Design Engineering, 2022, 30 (17): 90- 94
[23]   DOBREVSKI M, SKOČAJ D. Adaptive dynamic window approach for local navigation [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas: IEEE, 2020: 6930-6936.
[24]   DOBREVSKI M, SKOČAJ D. Dynamic adaptive dynamic window approach [J]. IEEE Transactions on Robotics, 2024, 40: 3068-3081.
[25]   ZHANG S, LI Y, DONG Q Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach[J]. Applied Soft Computing, 2022, 115
[26]   张仪, 冯伟, 王卫军, 等 融合LSTM和PPO算法的移动机器人视觉导航[J]. 电子测量与仪器学报, 2022, 36 (8): 132- 140
ZHANG Yi, FENG Wei, WANG Weijun, et al Vision-based navigation of mobile robots integrating LSTM and PPO algorithms[J]. Journal of Electronic Measurement and Instrumentation, 2022, 36 (8): 132- 140
[1] Kun HAO,Xuan MENG,Xiaofang ZHAO,Zhisheng LI. 3D underwater AUV path planning method integrating adaptive potential field method and deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1451-1461.
[2] Wei ZHAO,Wanzhi ZHANG,Jialin HOU,Rui HOU,Yuhua LI,Lejun ZHAO,Jin Cheng. Path planning of agricultural robots based on improved deep reinforcement learning algorithm[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1492-1503.
[3] Lihong WANG,Xinqian LIU,Jing LI,Zhiquan FENG. Network intrusion detection method based on federated learning and spatiotemporal feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1201-1210.
[4] Qincheng JIANG,Jianfeng TAO,Yangyang WANG,Yulei ZHANG,Chengliang LIU. EWT-LSTM based industrial robot joint anomaly detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 982-994.
[5] Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU. Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1923-1934.
[6] Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO. Traffic signal control method based on asynchronous advantage actor-critic[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1671-1680.
[7] Zhongliang LI,Qi CHEN,Lin SHI,Chao YANG,Xianming ZOU. Dynamic knowledge graph completion of temporal aware combination[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1738-1747.
[8] Xue-qi XING,Yu-tong DING,Tang-bin XIA,Er-shun PAN,Li-feng XI. Integrated modeling of commercial aircraft maintenance plan recommendation system based on knowledge graph[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 512-521.
[9] Qiao-hong CHEN,Jia-jin SUN,Yang-bo LOU,Zhi-jian FANG. Multimodal sentiment analysis model based on multi-task learning and stacked cross-modal Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2421-2429.
[10] Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.
[11] Yu-feng JIANG,Dong-sheng CHEN. Assembly strategy for large-diameter peg-in-hole based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2210-2216.
[12] Hua HUANG,Qiu-ge ZHAO,Zai-xing HE,Jia-ran LI. Contour error control of two-axis system based on LSTM and Newton iteration[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 10-20.
[13] Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.
[14] Zhi-min LIU,Bao-Lin YE,Yao-dong ZHU,Qing YAO,Wei-min WU. Traffic signal control method based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1249-1256.
[15] Wei-hang CHEN,Qiang LUO,Teng-fei WANG,Liang-wei JIANG,Liang ZHANG. Bi-LSTM based rolling forecast of subgrade post-construction settlement with unevenly spaced time series[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 683-691.