融合自适应势场法和深度强化学习的三维水下AUV路径规划方法

doi:10.3785/j.issn.1008-973X.2025.07.013

浙江大学学报(工学版)

2025, Vol. 59

Issue (7): 1451-1461 DOI: 10.3785/j.issn.1008-973X.2025.07.013

计算机技术与控制工程

融合自适应势场法和深度强化学习的三维水下AUV路径规划方法

郝琨(

),孟璇,赵晓芳*(

),李志圣

天津城建大学计算机与信息工程学院，天津 300384

3D underwater AUV path planning method integrating adaptive potential field method and deep reinforcement learning

Kun HAO(

),Xuan MENG,Xiaofang ZHAO*(

),Zhisheng LI

School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384

全文: PDF(4224 KB) HTML

摘要：

在复杂海洋环境中，AUV路径规划方法的生成路径质量低、动态避障能力差，为此提出新的三维水下AUV路径规划方法（IADQN）. 针对AUV在未知水下环境中障碍物识别和规避能力不足的问题，提出自适应势场法以提高AUV的动作选择效率. 为了解决传统深度Q网络（DQN）经验回放策略中样本选择效率低的问题，采用优先经验回放策略，从经验池中选择对训练贡献较高的样本来提高训练的效率. AUV根据当前状态动态调整奖励函数，加快DQN在训练期间的收敛速度. 仿真结果表明，与DQN方案相比，IADQN能够在真实的海洋环境下高效规划出省时、无碰撞的路径，使AUV运行时间缩短6.41 s，与洋流的最大夹角减少10.39°.

关键词： 路径规划; 深度强化学习; 自适应势场法; 自主水下航行器(AUV); 动态奖励函数

Abstract:

A new 3D underwater AUV path planning method (IADQN) was proposed due to the low quality of the generated path and poor dynamic obstacle avoidance ability of AUV path planning methods in complex marine environments. In order to resolve the problem of insufficient obstacle recognition and avoidance ability of AUVs in unknown underwater environments, an adaptive potential field method was proposed to improve the efficiency of action selection of AUVs. In order to address the problem of low sample selection efficiency in the traditional deep Q network (DQN) experience replay strategy, a priority experience replay strategy was adopted to select samples with higher contributions to training from the experience pool to improve the efficiency of training. AUV dynamically adjusts the reward function according to the current state to accelerate the convergence speed of IADQN during training. Simulation results show that, compared with the DQN scheme, IADQN plans a time-saving and collision-free path efficiently in a real ocean environment; the AUV running time is reduced by 6.41 s, and the maximum angle with the ocean current is reduced by 10.39°.

Key words: path planning deep reinforcement learning adaptive potential field method autonomous underwater vehicle (AUV) dynamic reward function

收稿日期: 2024-06-21 出版日期: 2025-07-25

CLC:

TP 18

基金资助: 国家自然科学基金资助项目（61902273）；教育部春晖计划项目（HZKY20220590）.

通讯作者: 赵晓芳 E-mail: kunhao@tcu.edu.cn;xfzhao@tcu.edu.cn

作者简介: 郝琨（1979—），女，教授，博士，从事水下传感器网络、计算机视觉研究. orcid.org/0000-0002-5627-7151. E-mail：kunhao@tcu.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	郝琨
	孟璇
	赵晓芳
	李志圣

引用本文:

郝琨,孟璇,赵晓芳,李志圣. 融合自适应势场法和深度强化学习的三维水下AUV路径规划方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1451-1461.

Kun HAO,Xuan MENG,Xiaofang ZHAO,Zhisheng LI. 3D underwater AUV path planning method integrating adaptive potential field method and deep reinforcement learning. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1451-1461.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.07.013 或 https://www.zjujournals.com/eng/CN/Y2025/V59/I7/1451

图 1 三维水下地形和洋流仿真

图 2 AUV六自由度模型

表 1 AUV六自由度模型的参数

图 3 三维网格环境中AUV动作方向

图 4 融合自适应势场法和深度强化学习的三维水下AUV路径规划方法框架图

图 5 障碍物的局部最小值问题

图 6 虚拟目标点选择示意图

图 7 “无障碍前进”规则的示意图

图 8 人工势场合力

表 2 AUV路径规划方法性能对比实验的参数取值

图 9 三维水下环境

图 10 三维水下环境中不同方法生成的路径对比图

表 3 三维水下环境中不同路径规划方法的性能指标对比

图 11 局部海底环境模拟图

图 12 局部海底环境中不同方法生成的路径对比图

表 4 局部海底环境中不同路径规划方法的性能指标对比

图 13 所提路径规划方法的动态避障

图 14 动态环境中2种方法生成的路径对比图

图 15 动态环境中2种路径规划方法的收敛速度对比图

1	杨波, 刘烨瑶, 廖佳伟载人潜水器: 面向深海科考和海洋资源开发利用的“国之重器”[J]. 中国科学院院刊, 2021, 36 (5): 622- 631 YANG Bo, LIU Yeyao, LIAO Jiawei Manned submersibles: deep-sea scientific research and exploitation of marine resources[J]. Bulletin of Chinese Academy of Sciences, 2021, 36 (5): 622- 631
2	CHENG C, SHA Q, HE B, et al Path planning and obstacle avoidance for AUV: a review[J]. Ocean Engineering, 2021, 235: 109355 doi: 10.1016/j.oceaneng.2021.109355
3	刘晨霞, 朱大奇, 周蓓, 等海流环境下多AUV多目标生物启发任务分配与路径规划算法[J]. 控制理论与应用, 2022, 39 (11): 2100- 2107 LIU Chenxia, ZHU Daqi, ZHOU Bei, et al A novel algorithm of multi-AUVs task assignment and path planning based on biologically inspired neural network for ocean current environment[J]. Control Theory and Applications, 2022, 39 (11): 2100- 2107 doi: 10.7641/CTA.2022.11019
4	MATSUO Y, LECUN Y, SAHANI M, et al Deep learning, reinforcement learning, and world models[J]. Neural Networks, 2022, 152: 267- 275 doi: 10.1016/j.neunet.2022.03.037
5	邢丽静, 李敏, 曾祥光, 等. 部分未知环境下基于行为克隆与改进DQN的AUV路径规划 [EB/OL]. (2024–11–06)[2025–06–20]. https://doi.org/10.16182/j.issn1004731x.joss.24-0678.
6	潘云伟, 李敏, 曾祥光, 等. 基于人工势场和改进强化学习的AUV避障和航迹规划 [EB/OL]. (2024–10–09)[2025–06–20]. https://link.cnki.net/urlid/11.2176.TJ.20241008.1329.002.
7	刘宇庭, 郭世杰, 唐术锋, 等改进A与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报: 工学版, 2024, 58 (2): 360- 369 LIU Yuting, GUO Shijie, TANG Shufeng, et al Path planning based on fusion of improved A and ROA-DWA for robot[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (2): 360- 369
8	万俊, 孙薇, 葛敏, 等基于含避障角人工势场法的机器人路径规划[J]. 农业机械学报, 2024, 55 (1): 409- 418 WAN Jun, SUN Wei, GE Min, et al Robot path planning based on artificial potential field method with obstacle avoidance angles[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (1): 409- 418 doi: 10.6041/j.issn.1000-1298.2024.01.039
9	ZHANG W, WANG N, WU W A hybrid path planning algorithm considering AUV dynamic constraints based on improved A* algorithm and APF algorithm[J]. Ocean Engineering, 2023, 285: 115333 doi: 10.1016/j.oceaneng.2023.115333
10	CHEN G, CHENG D, CHEN W, et al Path planning for AUVs based on improved APF-AC algorithm[J]. Computers, Materials and Continua, 2024, 78 (3): 3721- 3741 doi: 10.32604/cmc.2024.047325
11	YU F, SHANG H, ZHU Q, et al An efficient RRT-based motion planning algorithm for autonomous underwater vehicles under cylindrical sampling constraints[J]. Autonomous Robots, 2023, 47 (3): 281- 297 doi: 10.1007/s10514-023-10083-y
12	QI C, WU C, LEI L, et al. UAV path planning based on the improved PPO algorithm [C]// Proceedings of the Asia Conference on Advanced Robotics, Automation, and Control Engineering. Qingdao: IEEE, 2022: 193–199.
13	YANG Y, LI J, PENG L Multi-robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5 (3): 177- 183 doi: 10.1049/trit.2020.0024
14	WEN S, WEN Z, ZHANG D, et al A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning[J]. Applied Soft Computing, 2021, 110: 107605 doi: 10.1016/j.asoc.2021.107605
15	祁璇, 周通, 王村松, 等基于改进近端策略优化算法的AGV路径规划与任务调度[J]. 计算机集成制造系统, 2025, 31 (3): 955- 964 QI Xuan, ZHOU Tong, WANG Cunsong, et al AGV path planning and task scheduling based on improved proximal policy optimization algorithm[J]. Computer Integrated Manufacturing Systems, 2025, 31 (3): 955- 964
16	YANG J, NI J, XI M, et al Intelligent path planning of underwater robot based on reinforcement learning[J]. IEEE Transactions on Automation Science and Engineering, 2023, 20 (3): 1983- 1996 doi: 10.1109/TASE.2022.3190901
17	XING B, WANG X, YANG L, et al An algorithm of complete coverage path planning for unmanned surface vehicle based on reinforcement learning[J]. Journal of Marine Science and Engineering, 2023, 11 (3): 645 doi: 10.3390/jmse11030645
18	YANG J, HUO J, XI M, et al A time-saving path planning scheme for autonomous underwater vehicles with complex underwater conditions[J]. IEEE Internet of Things Journal, 2023, 10 (2): 1001- 1013 doi: 10.1109/JIOT.2022.3205685
19	孙月平, 方正, 袁必康, 等基于FIA-APF算法的蟹塘投饵船动态路径规划[J]. 农业工程学报, 2024, 40 (9): 137- 145 SUN Yueping, FANG Zheng, YUAN Bikang, et al Dynamic path planning for feeding boat in crab pond using FIA-APF algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering, 2024, 40 (9): 137- 145 doi: 10.11975/j.issn.1002-6819.202312211

[1]	叶俊,肖志斌,林晓阳,全冠,王震,王跃达,何江飞,赵阳. 基于多轴3D打印的三维自支撑桁架结构优化方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1333-1343.
[2]	于少猛,闫铭,王鹏飞,朱建锡,杨欣. 丘陵山地果园植保无人机三维路径规划[J]. 浙江大学学报(工学版), 2025, 59(3): 635-642.
[3]	张名芳,马健,赵娜乐,王力,刘颖. 无信号交叉口处基于深度强化学习的智能网联车辆运动规划[J]. 浙江大学学报(工学版), 2024, 58(9): 1923-1934.
[4]	叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.
[5]	刘宇庭,郭世杰,唐术锋,张学炜,李田田. *改进A与ROA-DWA融合的机器人路径规划**[J]. 浙江大学学报(工学版), 2024, 58(2): 360-369.
[6]	陈丽芳,杨火根,陈智超,杨杰. B样条技术与遗传算法融合的全局路径规划[J]. 浙江大学学报(工学版), 2024, 58(12): 2520-2530.
[7]	张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[8]	姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[9]	靳佳澳,沈洪垚,孙扬帆,林嘉浩,陈静霓. 面向电弧增材的单线激光扫描路径规划[J]. 浙江大学学报(工学版), 2023, 57(1): 21-31.
[10]	徐维祥,康楠,徐婷. 基于出行计划数据的最优路径规划方法[J]. 浙江大学学报(工学版), 2022, 56(8): 1542-1552.
[11]	华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[12]	刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[13]	戴天伦,李博涵,臧亚磊,戴华,于自强,陈钢. PORP：面向无人驾驶的路径规划并行优化策略[J]. 浙江大学学报(工学版), 2022, 56(2): 329-337.
[14]	邓齐林,鲁娟,陈勇辉,冯健,廖小平,马俊燕. 基于深度强化学习的数控铣削加工参数优化方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2145-2155.
[15]	马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.

Viewed

Full text

Abstract

Cited

Shared

Discussed