Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (7): 1451-1461    DOI: 10.3785/j.issn.1008-973X.2025.07.013
计算机技术与控制工程     
融合自适应势场法和深度强化学习的三维水下AUV路径规划方法
郝琨(),孟璇,赵晓芳*(),李志圣
天津城建大学 计算机与信息工程学院,天津 300384
3D underwater AUV path planning method integrating adaptive potential field method and deep reinforcement learning
Kun HAO(),Xuan MENG,Xiaofang ZHAO*(),Zhisheng LI
School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384
 全文: PDF(4224 KB)   HTML
摘要:

在复杂海洋环境中,AUV路径规划方法的生成路径质量低、动态避障能力差,为此提出新的三维水下AUV路径规划方法(IADQN). 针对AUV在未知水下环境中障碍物识别和规避能力不足的问题,提出自适应势场法以提高AUV的动作选择效率. 为了解决传统深度Q网络(DQN)经验回放策略中样本选择效率低的问题,采用优先经验回放策略,从经验池中选择对训练贡献较高的样本来提高训练的效率. AUV根据当前状态动态调整奖励函数,加快DQN在训练期间的收敛速度. 仿真结果表明,与DQN方案相比,IADQN能够在真实的海洋环境下高效规划出省时、无碰撞的路径,使AUV运行时间缩短6.41 s,与洋流的最大夹角减少10.39°.

关键词: 路径规划深度强化学习自适应势场法自主水下航行器(AUV)动态奖励函数    
Abstract:

A new 3D underwater AUV path planning method (IADQN) was proposed due to the low quality of the generated path and poor dynamic obstacle avoidance ability of AUV path planning methods in complex marine environments. In order to resolve the problem of insufficient obstacle recognition and avoidance ability of AUVs in unknown underwater environments, an adaptive potential field method was proposed to improve the efficiency of action selection of AUVs. In order to address the problem of low sample selection efficiency in the traditional deep Q network (DQN) experience replay strategy, a priority experience replay strategy was adopted to select samples with higher contributions to training from the experience pool to improve the efficiency of training. AUV dynamically adjusts the reward function according to the current state to accelerate the convergence speed of IADQN during training. Simulation results show that, compared with the DQN scheme, IADQN plans a time-saving and collision-free path efficiently in a real ocean environment; the AUV running time is reduced by 6.41 s, and the maximum angle with the ocean current is reduced by 10.39°.

Key words: path planning    deep reinforcement learning    adaptive potential field method    autonomous underwater vehicle (AUV)    dynamic reward function
收稿日期: 2024-06-21 出版日期: 2025-07-25
CLC:  TP 18  
基金资助: 国家自然科学基金资助项目(61902273);教育部春晖计划项目(HZKY20220590).
通讯作者: 赵晓芳     E-mail: kunhao@tcu.edu.cn;xfzhao@tcu.edu.cn
作者简介: 郝琨(1979—),女,教授,博士,从事水下传感器网络、计算机视觉研究. orcid.org/0000-0002-5627-7151. E-mail:kunhao@tcu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
郝琨
孟璇
赵晓芳
李志圣

引用本文:

郝琨,孟璇,赵晓芳,李志圣. 融合自适应势场法和深度强化学习的三维水下AUV路径规划方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1451-1461.

Kun HAO,Xuan MENG,Xiaofang ZHAO,Zhisheng LI. 3D underwater AUV path planning method integrating adaptive potential field method and deep reinforcement learning. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1451-1461.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.07.013        https://www.zjujournals.com/eng/CN/Y2025/V59/I7/1451

图 1  三维水下地形和洋流仿真
图 2  AUV六自由度模型
自由度含义参数
进退沿x轴的位移x/m线速度u/(m·s?1)
侧移沿y轴的位移y/m线速度v/(m·s?1)
升沉沿z轴的位移z/m线速度w/(m·s?1)
横滚x轴的旋转角度?/(°)角速度p/(rad·s?1)
俯仰y轴的旋转角度θ/(°)角速度q/(rad·s?1)
偏航z轴的旋转角度ψ/(°)角速度r/(rad·s?1)
表 1  AUV六自由度模型的参数
图 3  三维网格环境中AUV动作方向
图 4  融合自适应势场法和深度强化学习的三维水下AUV路径规划方法框架图
图 5  障碍物的局部最小值问题
图 6  虚拟目标点选择示意图
图 7  “无障碍前进”规则的示意图
图 8  人工势场合力
参数数值参数数值
自由度6$ \varepsilon $1
$ \mu $0.25经验回放缓冲区105
$ \omega $10取样数目26
$ t $0vmax/(m·s?13
$ {\varepsilon }_{{\mathrm{dec}}} $0.999aauv/(m·s?20.5
$ {\varepsilon }_{{\mathrm{min}}} $0
表 2  AUV路径规划方法性能对比实验的参数取值
图 9  三维水下环境
图 10  三维水下环境中不同方法生成的路径对比图
方法l/mSoγmax/(°)tr/s
IADQN6317.2871.1295.45
APF15000647.95131.4713 580.69
A*631980.04106.56
RRT71.6219.11136.19112.18
DQN5721.9987.18158.83
表 3  三维水下环境中不同路径规划方法的性能指标对比
图 11  局部海底环境模拟图
图 12  局部海底环境中不同方法生成的路径对比图
方法局部海底环境1局部海底环境2
l/mSoγmax/(°)tr/sl/mSoγmax/(°)tr/s
IADQN2 171.9811.0065.23616.864 462.2223.5676.261 264.10
APF2 380.8829.8590.00815.1726 184.47782.2690.009 142.90
A*2 171.9813.8566.59751.134 492.0129.8570.471 466.79
RRT2 279.3313.0990.00622.656 592.5630.63143.031 786.97
DQN2 171.9811.0075.62623.274 462.2232.9993.281 378.76
PPO2 171.9825.1390.00737.794 462.2226.7084.001 392.03
表 4  局部海底环境中不同路径规划方法的性能指标对比
图 13  所提路径规划方法的动态避障
图 14  动态环境中2种方法生成的路径对比图
图 15  动态环境中2种路径规划方法的收敛速度对比图
1 杨波, 刘烨瑶, 廖佳伟 载人潜水器: 面向深海科考和海洋资源开发利用的“国之重器”[J]. 中国科学院院刊, 2021, 36 (5): 622- 631
YANG Bo, LIU Yeyao, LIAO Jiawei Manned submersibles: deep-sea scientific research and exploitation of marine resources[J]. Bulletin of Chinese Academy of Sciences, 2021, 36 (5): 622- 631
2 CHENG C, SHA Q, HE B, et al Path planning and obstacle avoidance for AUV: a review[J]. Ocean Engineering, 2021, 235: 109355
doi: 10.1016/j.oceaneng.2021.109355
3 刘晨霞, 朱大奇, 周蓓, 等 海流环境下多AUV多目标生物启发任务分配与路径规划算法[J]. 控制理论与应用, 2022, 39 (11): 2100- 2107
LIU Chenxia, ZHU Daqi, ZHOU Bei, et al A novel algorithm of multi-AUVs task assignment and path planning based on biologically inspired neural network for ocean current environment[J]. Control Theory and Applications, 2022, 39 (11): 2100- 2107
doi: 10.7641/CTA.2022.11019
4 MATSUO Y, LECUN Y, SAHANI M, et al Deep learning, reinforcement learning, and world models[J]. Neural Networks, 2022, 152: 267- 275
doi: 10.1016/j.neunet.2022.03.037
5 邢丽静, 李敏, 曾祥光, 等. 部分未知环境下基于行为克隆与改进DQN的AUV路径规划 [EB/OL]. (2024–11–06)[2025–06–20]. https://doi.org/10.16182/j.issn1004731x.joss.24-0678.
6 潘云伟, 李敏, 曾祥光, 等. 基于人工势场和改进强化学习的AUV避障和航迹规划 [EB/OL]. (2024–10–09)[2025–06–20]. https://link.cnki.net/urlid/11.2176.TJ.20241008.1329.002.
7 刘宇庭, 郭世杰, 唐术锋, 等 改进A*与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报: 工学版, 2024, 58 (2): 360- 369
LIU Yuting, GUO Shijie, TANG Shufeng, et al Path planning based on fusion of improved A* and ROA-DWA for robot[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (2): 360- 369
8 万俊, 孙薇, 葛敏, 等 基于含避障角人工势场法的机器人路径规划[J]. 农业机械学报, 2024, 55 (1): 409- 418
WAN Jun, SUN Wei, GE Min, et al Robot path planning based on artificial potential field method with obstacle avoidance angles[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (1): 409- 418
doi: 10.6041/j.issn.1000-1298.2024.01.039
9 ZHANG W, WANG N, WU W A hybrid path planning algorithm considering AUV dynamic constraints based on improved A* algorithm and APF algorithm[J]. Ocean Engineering, 2023, 285: 115333
doi: 10.1016/j.oceaneng.2023.115333
10 CHEN G, CHENG D, CHEN W, et al Path planning for AUVs based on improved APF-AC algorithm[J]. Computers, Materials and Continua, 2024, 78 (3): 3721- 3741
doi: 10.32604/cmc.2024.047325
11 YU F, SHANG H, ZHU Q, et al An efficient RRT-based motion planning algorithm for autonomous underwater vehicles under cylindrical sampling constraints[J]. Autonomous Robots, 2023, 47 (3): 281- 297
doi: 10.1007/s10514-023-10083-y
12 QI C, WU C, LEI L, et al. UAV path planning based on the improved PPO algorithm [C]// Proceedings of the Asia Conference on Advanced Robotics, Automation, and Control Engineering. Qingdao: IEEE, 2022: 193–199.
13 YANG Y, LI J, PENG L Multi-robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5 (3): 177- 183
doi: 10.1049/trit.2020.0024
14 WEN S, WEN Z, ZHANG D, et al A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning[J]. Applied Soft Computing, 2021, 110: 107605
doi: 10.1016/j.asoc.2021.107605
15 祁璇, 周通, 王村松, 等 基于改进近端策略优化算法的AGV路径规划与任务调度[J]. 计算机集成制造系统, 2025, 31 (3): 955- 964
QI Xuan, ZHOU Tong, WANG Cunsong, et al AGV path planning and task scheduling based on improved proximal policy optimization algorithm[J]. Computer Integrated Manufacturing Systems, 2025, 31 (3): 955- 964
16 YANG J, NI J, XI M, et al Intelligent path planning of underwater robot based on reinforcement learning[J]. IEEE Transactions on Automation Science and Engineering, 2023, 20 (3): 1983- 1996
doi: 10.1109/TASE.2022.3190901
17 XING B, WANG X, YANG L, et al An algorithm of complete coverage path planning for unmanned surface vehicle based on reinforcement learning[J]. Journal of Marine Science and Engineering, 2023, 11 (3): 645
doi: 10.3390/jmse11030645
18 YANG J, HUO J, XI M, et al A time-saving path planning scheme for autonomous underwater vehicles with complex underwater conditions[J]. IEEE Internet of Things Journal, 2023, 10 (2): 1001- 1013
doi: 10.1109/JIOT.2022.3205685
19 孙月平, 方正, 袁必康, 等 基于FIA*-APF算法的蟹塘投饵船动态路径规划[J]. 农业工程学报, 2024, 40 (9): 137- 145
SUN Yueping, FANG Zheng, YUAN Bikang, et al Dynamic path planning for feeding boat in crab pond using FIA*-APF algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering, 2024, 40 (9): 137- 145
doi: 10.11975/j.issn.1002-6819.202312211
[1] 叶俊,肖志斌,林晓阳,全冠,王震,王跃达,何江飞,赵阳. 基于多轴3D打印的三维自支撑桁架结构优化方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1333-1343.
[2] 于少猛,闫铭,王鹏飞,朱建锡,杨欣. 丘陵山地果园植保无人机三维路径规划[J]. 浙江大学学报(工学版), 2025, 59(3): 635-642.
[3] 张名芳,马健,赵娜乐,王力,刘颖. 无信号交叉口处基于深度强化学习的智能网联车辆运动规划[J]. 浙江大学学报(工学版), 2024, 58(9): 1923-1934.
[4] 叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.
[5] 刘宇庭,郭世杰,唐术锋,张学炜,李田田. 改进A*与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报(工学版), 2024, 58(2): 360-369.
[6] 陈丽芳,杨火根,陈智超,杨杰. B样条技术与遗传算法融合的全局路径规划[J]. 浙江大学学报(工学版), 2024, 58(12): 2520-2530.
[7] 张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[8] 姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[9] 靳佳澳,沈洪垚,孙扬帆,林嘉浩,陈静霓. 面向电弧增材的单线激光扫描路径规划[J]. 浙江大学学报(工学版), 2023, 57(1): 21-31.
[10] 徐维祥,康楠,徐婷. 基于出行计划数据的最优路径规划方法[J]. 浙江大学学报(工学版), 2022, 56(8): 1542-1552.
[11] 华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[12] 刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[13] 戴天伦,李博涵,臧亚磊,戴华,于自强,陈钢. PORP:面向无人驾驶的路径规划并行优化策略[J]. 浙江大学学报(工学版), 2022, 56(2): 329-337.
[14] 邓齐林,鲁娟,陈勇辉,冯健,廖小平,马俊燕. 基于深度强化学习的数控铣削加工参数优化方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2145-2155.
[15] 马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.