Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (8): 1671-1679    DOI: 10.3785/j.issn.1008-973X.2025.08.014
计算机技术、控制工程、通信技术     
动态窗口法引导的TD3无地图导航算法
柳佳乐1(),薛雅丽1,*(),崔闪2,洪君2
1. 南京航空航天大学 自动化学院,江苏 南京 211106
2. 上海机电工程研究所,上海 201109
TD3 mapless navigation algorithm guided by dynamic window approach
Jiale LIU1(),Yali XUE1,*(),Shan CUI2,Jun HONG2
1. College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
2. Shanghai Electro-Mechanical Engineering Institute, Shanghai 201109, China
 全文: PDF(1548 KB)   HTML
摘要:

针对深度强化学习(DRL)算法训练数据需求量大、连续导航信息利用不充分的问题,提出DWA-LSTM TD3算法. 该算法根据目标点相对位置、机器人自身速度和当前激光雷达数据控制机器人运动,过程无需先验地图. 在训练过程中,利用动态窗口法(DWA)引导双延迟确定策略梯度(TD3),提高训练数据的质量. 在策略网络中引入长短期记忆神经网络(LSTM),提升智能体对连续导航信息的处理能力. 搭建仿真环境训练测试,与其他方法进行对比. 实验结果表明,DWA-LSTM TD3在相同的训练步数下能够获得更高的奖励值,提高了导航任务的成功率;导航姿态角的波动范围变化更小,轨迹更平滑,改善机器人的运动安全性能. 利用该算法,能够在不同场景下高效完成导航任务. 该算法具有很强的泛化能力.

关键词: 无地图导航动态窗口法深度强化学习双延迟确定策略梯度算法长短期记忆    
Abstract:

The DWA-LSTM TD3 algorithm was proposed in order to address the challenges of high data demand in deep reinforcement learning (DRL) and insufficient utilization of continuous navigation information. Robot motion was controlled based on the relative position of the target point, the robot’s own velocity, and current LiDAR data, without relying on any prior map. The dynamic window approach (DWA) was employed to guide the twin delayed deep deterministic policy gradient (TD3) algorithm during training, thereby enhancing the quality of collected training data. A long short-term memory (LSTM) neural network was integrated into the policy network in order to improve the agent’s ability to process continuous navigation information. A simulation environment was constructed for training and evaluation, and comparative experiments were conducted with other methods. The experimental results show that the DWA-LSTM TD3 algorithm achieves higher cumulative rewards and improves the success rate of navigation tasks under the same number of training steps. The fluctuation range of navigation orientation angles was reduced, smoother trajectories were produced, and the safety performance of robot motion was enhanced. The algorithm can be used to efficiently accomplish across various scenarios. The algorithm has strong generalization ability.

Key words: mapless navigation    dynamic window approach    deep reinforcement learning    twin delayed deep deterministic policy gradient algorithm    long short-term memory
收稿日期: 2024-05-24 出版日期: 2025-07-28
:  TP 242  
基金资助: 国家自然科学基金资助项目(62073164);上海市航天科技创新基金资助项目(SAST2022-013).
通讯作者: 薛雅丽     E-mail: liujiale@nuaa.edu.cn;xueyali@nuaa.edu.cn
作者简介: 柳佳乐(1999—),男,硕士生,从事智能体导航决策的研究. orcid.org/ 0009-0007-7253-5601. E-mail:liujiale@nuaa.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
柳佳乐
薛雅丽
崔闪
洪君

引用本文:

柳佳乐,薛雅丽,崔闪,洪君. 动态窗口法引导的TD3无地图导航算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1671-1679.

Jiale LIU,Yali XUE,Shan CUI,Jun HONG. TD3 mapless navigation algorithm guided by dynamic window approach. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1671-1679.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.08.014        https://www.zjujournals.com/eng/CN/Y2025/V59/I8/1671

图 1  双轮差速底盘机器人的运动学模型
图 2  TD3网络的结构
图 3  DWA引导的TD3算法的整体架构
图 4  LSTM网络模型
图 5  LSTM数据的处理流程
图 6  Actor-Critic网络结构
图 7  导航实验的训练环境
参数数值
折扣因子 $\gamma $0.99
软目标更新率 τ0.005
每回合最大时间步 MaxStep500
学习率 LearningRate0.001
策略噪声 PolicyNoise0.2
经验池大小 ReplayBuffer${10^6}$
表 1  DWA-LSTM TD3算法的超参数设定
图 8  训练平均奖励曲线的对比
方法成功率步数奖励
PPO0.7644.2931.92
DDPG0.8743.8867.33
TD30.9052.2755.71
DWA-TD30.8735.9363.96
LSTM-TD30.9144.7560.07
DWA-LSTM TD30.9136.8970.19
表 2  训练所得模型的平均奖励值
图 9  场景1的实验轨迹可视化结果对比
方法步数平均速度
DDPG1340.389
PPO1300.479
TD3710.667
DWA-TD3640.708
LSTM-TD3760.596
DWA-LSTM TD3680.673
表 3  场景1实验所用的时间步
方法步数平均速度
DDPG970.731
PPO1950.345
TD3960.617
DWA-TD3630.614
LSTM-TD3790.491
DWA-LSTM TD3650.596
表 4  场景2的实验所用时间步
图 10  场景2的实验轨迹可视化结果对比
图 11  场景1的实验角速度数据箱型图对比
图 12  场景2的实验角速度数据箱型图对比
1 ZHANG H, LIN W, CHEN A Path planning for the mobile robot: a review[J]. Symmetry, 2018, 10 (10): 450
doi: 10.3390/sym10100450
2 SINGH R, REN J, LIN X A Review of deep reinforcement learning algorithms for mobile robot path planning[J]. Vehicles, 2023, 5 (4): 1423- 1451
doi: 10.3390/vehicles5040078
3 ZHU K, ZHANG T Deep reinforcement learning based mobile robot navigation: a review[J]. Tsinghua Science and Technology, 2021, 26 (5): 674- 691
doi: 10.26599/TST.2021.9010012
4 刘宇庭, 郭世杰, 唐术锋, 等 改进A*与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报: 工学版, 2024, 58 (2): 360- 369
LIU Yuting, GUO Shijie, TANG Shufeng, et al Improved A* and ROA-DWA fusion for robot path planning[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (2): 360- 369
5 YAN K, MA B Mapless navigation based on 2D LIDAR in complex unknown environments[J]. Sensors, 2020, 20 (20): 5802
doi: 10.3390/s20205802
6 ALI M, LIU L. GP-Frontier for local mapless navigation [C]//IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 10047-10053.
7 康振兴 基于路径规划和深度强化学习的机器人避障导航研究[J]. 计算机应用与软件, 2024, 41 (1): 297- 303
KANG Zhenxing Research on robot obstacle avoidance navigation based on path planning and deep reinforcement learning[J]. Computer Applications and Software, 2024, 41 (1): 297- 303
doi: 10.3969/j.issn.1000-386x.2024.01.043
8 朱威, 洪力栋, 施海东, 等. 结合优势结构和最小目标Q值的深度强化学习导航算法[J/OL]. 控制理论与应用, 2024, 41(4): 716-728 [2024-05-10]. http://kns.cnki.net/kcms/detail/44.1240.TP.20230223.1323.020.html.
ZHU Wei, HONG Lidong, SHI Haidong, et al. Deep reinforcement learning navigation algorithm combining advantage structure and minimum target Q value [J/OL]. Control Theory and Applications, 2024, 41(4): 716-728 [2024-05-10]. http://kns.cnki.net/kcms/detail/44.1240.TP.20230223.1323.020.html.
9 李昭莹, 欧一鸣, 石若凌 基于深度Q网络的改进RRT路径规划算法[J]. 空天防御, 2021, 4 (3): 17- 23
LI Zhaoying, OU Yiming, SHI Ruoling Improved RRT path planning algorithm based on deep Q-network[J]. Air and Space Defense, 2021, 4 (3): 17- 23
doi: 10.3969/j.issn.2096-4641.2021.03.003
10 WANG C, WANG J, SHEN Y, et al Autonomous navigation of UAVs in large-scale complex environments: a deep reinforcement learning approach[J]. IEEE Transactions on Vehicular Technology, 2019, 68 (3): 2124- 2136
doi: 10.1109/TVT.2018.2890773
11 CIMURS R, SUH I H, LEE J H Goal-driven autonomous exploration through deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2021, 7 (2): 730- 737
12 KONG F, WANG Q, GAO S, et al B-APFDQN: a UAV path planning algorithm based on deep Q-network and artificial potential field[J]. IEEE Access, 2023, 11: 44051- 44064
doi: 10.1109/ACCESS.2023.3273164
13 李永迪, 李彩虹, 张耀玉, 等 基于APF-LSTM-DDPG算法的移动机器人局部路径规划[J]. 山东理工大学学报: 自然科学版, 2024, 38 (1): 33- 41
LI Yongdi, LI Caihong, ZHANG Yaoyu, et al Local path planning for mobile robots based on APF-LSTM-DDPG algorithm[J]. Journal of Shandong University of Technology: Natural Science Edition, 2024, 38 (1): 33- 41
14 张凤, 顾琦然, 袁帅 好奇心蒸馏双Q网络移动机器人路径规划方法[J]. 计算机工程与应用, 2023, 59 (19): 316- 322
ZHANG Feng, GU Qiran, YUAN Shuai Path planning method for mobile robots using curiosity distilled double Q-network[J]. Computer Engineering and Applications, 2023, 59 (19): 316- 322
doi: 10.3778/j.issn.1002-8331.2208-0422
15 ZHANG Q, ZHANG L, MA Q, et al. The LSTM-PER-TD3 algorithm for deep reinforcement learning in continuous control tasks [C]//China Automation Congress. Chongqing: IEEE, 2023: 671-676.
16 TAN Y, LIN Y, LIU T, et al. PL-TD3: a dynamic path planning algorithm of mobile robot [C]//IEEE International Conference on Systems, Man, and Cybernetics. Prague: IEEE, 2022: 3040-3045.
17 HUANG B, XIE J, YAN J Inspection robot navigation based on improved TD3 algorithm[J]. Sensors, 2024, 24 (8): 2525
doi: 10.3390/s24082525
18 XIE L, WANG S, ROSA S, et al. Learning with training wheels: speeding up training with a simple controller for deep reinforcement learning [C]//IEEE International Conference on Robotics and Automation. Brisbane: IEEE, 2018: 6276-6283.
19 YU W, PENG J, QIU Q, et al. PathRL: an end-to-end path generation method for collision avoidance via deep reinforcement learning [C]// IEEE International Conference on Robotics and Automation. Yokohama: IEEE, 2024: 9278-9284.
20 户高铭, 蔡克卫, 王芳, 等 基于深度强化学习的无地图移动机器人导航[J]. 控制与决策, 2024, 39 (3): 985- 993
HU Gaoming, CAI Kewei, WANG Fang, et al Mapless navigation for mobile robots based on deep reinforcement learning[J]. Control and Decision, 2024, 39 (3): 985- 993
21 金毅康. 移动机器人多机探索与路径规划算法研究[D]. 西安: 西安电子科技大学, 2020.
JIN Yikang. Research on multi-robot exploration and path planning algorithms for mobile robots [D]. Xi’an: Xidian University, 2020.
22 罗洁, 王中训, 潘康路, 等 基于改进人工势场法的无人车路径规划算法[J]. 电子设计工程, 2022, 30 (17): 90- 94
LUO Jie, WANG Zhongxun, PAN Kanglu, et al Path planning algorithm for unmanned vehicles based on improved artificial potential field method[J]. Electronic Design Engineering, 2022, 30 (17): 90- 94
23 DOBREVSKI M, SKOČAJ D. Adaptive dynamic window approach for local navigation [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas: IEEE, 2020: 6930-6936.
24 DOBREVSKI M, SKOČAJ D. Dynamic adaptive dynamic window approach [J]. IEEE Transactions on Robotics, 2024, 40: 3068-3081.
25 ZHANG S, LI Y, DONG Q Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach[J]. Applied Soft Computing, 2022, 115
26 张仪, 冯伟, 王卫军, 等 融合LSTM和PPO算法的移动机器人视觉导航[J]. 电子测量与仪器学报, 2022, 36 (8): 132- 140
ZHANG Yi, FENG Wei, WANG Weijun, et al Vision-based navigation of mobile robots integrating LSTM and PPO algorithms[J]. Journal of Electronic Measurement and Instrumentation, 2022, 36 (8): 132- 140
[1] 郝琨,孟璇,赵晓芳,李志圣. 融合自适应势场法和深度强化学习的三维水下AUV路径规划方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1451-1461.
[2] 赵威,张万枝,侯加林,侯瑞,李玉华,赵乐俊,程进. 基于改进深度强化学习算法的农业机器人路径规划[J]. 浙江大学学报(工学版), 2025, 59(7): 1492-1503.
[3] 王立红,刘新倩,李静,冯志全. 基于联邦学习和时空特征融合的网络入侵检测方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1201-1210.
[4] 张名芳,马健,赵娜乐,王力,刘颖. 无信号交叉口处基于深度强化学习的智能网联车辆运动规划[J]. 浙江大学学报(工学版), 2024, 58(9): 1923-1934.
[5] 叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.
[6] 刘宇庭,郭世杰,唐术锋,张学炜,李田田. 改进A*与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报(工学版), 2024, 58(2): 360-369.
[7] 邢雪琪,丁雨童,夏唐斌,潘尔顺,奚立峰. 基于知识图谱的商用飞机维修方案推荐系统集成建模[J]. 浙江大学学报(工学版), 2023, 57(3): 512-521.
[8] 张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[9] 姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[10] 黄华,赵秋舸,何再兴,李嘉然. 基于LSTM与牛顿迭代的两轴系统轮廓误差控制[J]. 浙江大学学报(工学版), 2023, 57(1): 10-20.
[11] 华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[12] 刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[13] 陈伟航,罗强,王腾飞,蒋良潍,张良. 基于Bi-LSTM的非等时距路基工后沉降滚动预测[J]. 浙江大学学报(工学版), 2022, 56(4): 683-691.
[14] 童林,官铮,王立威,杨文韬,姚洋. 基于时序分解与误差修正的新能源爬坡事件预测[J]. 浙江大学学报(工学版), 2022, 56(2): 338-346.
[15] 邓齐林,鲁娟,陈勇辉,冯健,廖小平,马俊燕. 基于深度强化学习的数控铣削加工参数优化方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2145-2155.