Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (5): 964-976    DOI: 10.3785/j.issn.1008-973X.2026.05.006
    
Multi-ship collision avoidance via route exchange mechanism: strategy learning and game-theoretic decision making
Yang WANG1,2,3(),Hongchao LIU1,2,3,Chi TIAN4,Bing WU1,2,3,Di ZHANG1,2,3,*()
1. State Key Laboratory of Maritime Technology and Safety, Wuhan University of Technology, Wuhan 430063, China
2. Intelligent Transportation Systems Research Center, Wuhan University of Technology, Wuhan 430063, China
3. School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China
4. CSSC Pride (Nanjing) Technology Group Co., Ltd, Nanjing 211106, China
Download: HTML     PDF(4113KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

To address the multi-ship collision avoidance problem in the context of growing onboard intelligence, a cooperative collision avoidance game model based on multi-agent reinforcement learning was developed using the route exchange mechanism. Real-time sharing and negotiation of intended route information among ships were enabled. The multi-ship collision avoidance decision was transformed into a multi-agent cooperative game model, with each ship possessing independent decision-making and execution capabilities and being driven by rationality and economic considerations. The objective is to optimize navigational efficiency, minimize collision risk, and comply with anti-collision rules. The multi-agent deep deterministic policy gradient algorithm was employed within a centralized training with decentralized execution framework to optimize collision avoidance strategies, enabling an approach to the Pareto optimal solution. Simulation results demonstrate that optimized routes obtained through reasonable heading adjustments effectively avoid collision zones, balancing safety, compliance, and navigational efficiency. The model that integrates multi-agent reinforcement learning and game theory provides a feasible solution for intelligent ship collision avoidance decisions under the E-navigation paradigm.



Key wordswaterway transportation      multi-ship collision avoidance      route exchange      multi-agent reinforcement learning      game theory     
Received: 29 May 2025      Published: 06 May 2026
CLC:  U 675.96  
Fund:  国家自然科学基金资助项目(52425210,52372320);国家重点研发计划资助项目(2023YFB4301800,2023YFC3010803).
Corresponding Authors: Di ZHANG     E-mail: wangyang.itsc@whut.edu.cn;zhangdi@whut.edu.cn
Cite this article:

Yang WANG,Hongchao LIU,Chi TIAN,Bing WU,Di ZHANG. Multi-ship collision avoidance via route exchange mechanism: strategy learning and game-theoretic decision making. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 964-976.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.05.006     OR     https://www.zjujournals.com/eng/Y2026/V60/I5/964


航线交换机制下多船避碰的策略学习与博弈决策

针对船舶智能化水平不断提升背景下面临的多船避碰问题,通过航线交换机制,构建基于多智能体强化学习的协同避碰博弈模型,以实现船舶间意向航线信息的实时共享与协商. 由于每艘船舶具备独立的决策与执行能力,在理性与经济性的联合驱动下,将多船避碰决策转化为多智能体协同博弈模型. 各船舶旨在优化航线便捷性、最小化碰撞风险并遵循避让规则,采用多智能体深度确定性策略梯度算法,通过集中训练-分布执行框架优化避碰策略,逐步逼近Pareto最优解. 仿真结果显示,通过合理调整航向得到的优化航线能够有效规避碰撞区域,兼顾安全性与合规性,提升航行效率. 融合多智能体强化学习与博弈论的避碰模型为E-航海条件下智能船舶避碰决策提供了较好可行性的实施方案.


关键词: 水路交通,  多船避碰,  航线交换,  多智能体强化学习,  博弈论 
Fig.1 Principle of route exchange
Fig.2 Closest spatio-temporal distance among ships
Fig.3 Avoidance actions of ships in typical encounter scenarios
Fig.4 Interpretations of stand-on/give-way relationship in ship encounters
Fig.5 Range of waypoint adjustment for route revision
Fig.6 Impact of waypoint loss on closest spatio-temporal distance under different route curvature conditions
Fig.7 Effect of wind, wave, and current on ship trajectory
Fig.8 Framework of multi-agent deep deterministic policy gradient algorithm
Fig.9 Effect of reward weight combination on key parameters for collision avoidance decision-making
参数数值参数数值
训练最大回合数104经验池大小$ D $106
最大时间步长Step500网络学习率$ \alpha $0.0005
采样样本数$ K $256奖励折扣系数$ \gamma $0.98
软更新系数$ \tau $0.0005
Tab.1 Parameter of multi-agent deep deterministic policy gradient algorithm
Fig.10 Network design of multi-agent deep deterministic policy gradient algorithm
船1船2船3船4
$ \text{route}_{0}^{1} $时间位置$ \text{route}_{0}^{2} $时间位置$ \text{route}_{0}^{3} $时间位置$ \text{route}_{0}^{4} $时间位置
$ \text{WP}_{0}^{1}(t_{0}^{1}) $13:30:00(7.40,3.62)$ \text{WP}_{0}^{2}(t_{0}^{1}) $13:30:00(1.56,11.81)$ \text{WP}_{0}^{3}(t_{0}^{1}) $13:30:00(11.73,10.87)$ \text{WP}_{0}^{4}(t_{0}^{1}) $13:30:00(4.49,14.25)
$ \text{WP}_{0}^{1}(t_{0}^{2}) $13:36:52(7.40,5.12)$ \text{WP}_{0}^{2}(t_{0}^{2}) $13:37:19(2.93,11.21)$ \text{WP}_{0}^{3}(t_{0}^{2}) $13:36:03(10.57,9.92)$ \text{WP}_{0}^{4}(t_{0}^{2}) $13:36:34(4.79,12.78)
$ \text{WP}_{0}^{1}(t_{0}^{3}) $13:43:44(7.40,6.62)$ \text{WP}_{0}^{2}(t_{0}^{3}) $13:44:38(4.31,10.61)$ \text{WP}_{0}^{3}(t_{0}^{3}) $13:42:05(9.40,8.97)$ \text{WP}_{0}^{4}(t_{0}^{3}) $13:43:08(5.08,11.31)
$ \text{WP}_{0}^{1}(t_{0}^{4}) $13:50:37(7.40,8.12)$ \text{WP}_{0}^{2}(t_{0}^{4}) $13:51:57(5.68,10.01)$ \text{WP}_{0}^{3}(t_{0}^{4}) $13:48:07(8.24,8.03)$ \text{WP}_{0}^{4}(t_{0}^{4}) $13:49:43(5.38,9.84)
$ \text{WP}_{0}^{1}(t_{0}^{5}) $13:57:29(7.40,9.62)$ \text{WP}_{0}^{2}(t_{0}^{5}) $13:59:16(7.05,9.41)$ \text{WP}_{0}^{3}(t_{0}^{5}) $13:54:10(7.07,7.08)$ \text{WP}_{0}^{4}(t_{0}^{5}) $13:56:17(5.68,8.37)
$ \text{WP}_{0}^{1}(t_{0}^{6}) $14:04:21(7.40,11.12)$ \text{WP}_{0}^{2}(t_{0}^{6}) $14:06:35(8.43,8.81)$ \text{WP}_{0}^{3}(t_{0}^{6}) $14:00:12(5.91,6.14)$ \text{WP}_{0}^{4}(t_{0}^{6}) $14:02:51(5.98,6.90)
$ \text{WP}_{0}^{1}(t_{0}^{7}) $14:11:13(7.40,12.62)$ \text{WP}_{0}^{2}(t_{0}^{7}) $14:13:54(9.80,8.21)$ \text{WP}_{0}^{3}(t_{0}^{7}) $14:06:15(4.74,5.19)$ \text{WP}_{0}^{4}(t_{0}^{7}) $14:09:25(6.28,5.43)
$ \text{WP}_{0}^{1}(t_{0}^{8}) $14:18:05(7.40,14.12)$ \text{WP}_{0}^{2}(t_{0}^{8}) $14:21:13(11.18,7.60)$ \text{WP}_{0}^{3}(t_{0}^{8}) $14:12:17(3.58,4.25)$ \text{WP}_{0}^{4}(t_{0}^{8}) $14:15:58(6.57,3.96)
Tab.2 Waypoint information for four ships
Fig.11 Routing schemes at different time steps under multi-agent deep deterministic policy gradient algorithm
Fig.12 Variation of distance between ship pairs over time
Fig.13 Variation of total reward and average route deviation with training episodes
Fig.14 Initial scenario of multi-ship encounter
Fig.15 Pareto-optimal routes solved by multi-agent deep deterministic policy gradient algorithm
Fig.16 Performance comparison of different collision avoidance algorithms
Fig.17 Principle of velocity obstacle algorithm
Fig.18 Comparison of routes generated by different collision avoidance algorithms
[1]   中国船东互保协会. 2023船舶安全风险报告[EB/OL]. (2024−01−23)[2025−05−11]. https://www.chinapandi.com/index.php/cn/?option=com_attachments&task=download&id=590.
[2]   International Maritime Organization. Strategy for the development and implementation of e-navigation [EB/OL]. (2011−07−25)[2025−05−11]. https://wwwcdn.imo.org/localresources/en/OurWork/Safety/Documents/enavigation/MSC%2085%20-%20annex%2020%20-%20Strategy%20for%20the%20development%20and%20implementation%20of%20e-nav.pdf.
[3]   International Maritime Organization. E-navigation strategy implementation plan [EB/OL]. (2018−05−28)[2025−05−11]. https://wwwcdn.imo.org/localresources/en/OurWork/Safety/Documents/enavigation/MSC.1-Circ.1595%20-%20E-Navigation%20Strategy%20Implementation%20Plan%20-%20Update%201%20(Secretariat)%20(2).pdf.
[4]   贺益雄, 代永刚, 赵兴亚, 等 河口深槽可航宽度变化水域航行决策方法[J]. 上海交通大学学报, 2025, 59 (4): 489- 502
HE Yixiong, DAI Yonggang, ZHAO Xingya, et al Navigation decision method in estuary deep trough with varying width of navigable waters[J]. Journal of Shanghai Jiaotong University, 2025, 59 (4): 489- 502
doi: 10.16183/j.cnki.jsjtu.2023.356
[5]   吴建军, 陈炎, 朱清华, 等 紧迫危险威胁下交叉相遇局面应急操船方法[J]. 中国安全科学学报, 2024, 34 (5): 238- 246
WU Jianjun, CHEN Yan, ZHU Qinghua, et al Emergency ship maneuvering method for crossing encounter situation under immediate danger threat[J]. China Safety Science Journal, 2024, 34 (5): 238- 246
doi: 10.16265/j.cnki.issn1003-3033.2024.05.0910
[6]   HUANG Y, VAN GELDER P, WEN Y Velocity obstacle algorithms for collision prevention at sea[J]. Ocean Engineering, 2018, 151: 308- 321
doi: 10.1016/j.oceaneng.2018.01.001
[7]   WANG T, YAN X, WANG Y, et al Ship domain model for multi-ship collision avoidance decision-making with COLREGs based on artificial potential field[J]. TransNav: International Journal on Marine Navigation and Safety of Sea Transportation, 2017, 11 (1): 85- 92
doi: 10.12716/1001.11.01.09
[8]   NING J, CHEN H, LI T, et al COLREGs-compliant unmanned surface vehicles collision avoidance based on multi-objective genetic algorithm[J]. IEEE Access, 2020, 8: 190367- 190377
doi: 10.1109/ACCESS.2020.3030262
[9]   WANG T, WU Q, ZHANG J, et al Autonomous decision-making scheme for multi-ship collision avoidance with iterative observation and inference[J]. Ocean Engineering, 2020, 197: 106873
doi: 10.1016/j.oceaneng.2019.106873
[10]   欧阳旭东, 支云翔, 王腾飞, 等 基于扩展式动态博弈的多船避碰决策模型[J]. 中国安全科学学报, 2020, 30 (1): 128- 135
OUYANG Xudong, ZHI Yunxiang, WANG Tengfei, et al Antensive form game theory based multi-ship collision avoidance scheme[J]. China Safety Science Journal, 2020, 30 (1): 128- 135
doi: 10.16265/j.cnki.issn1003-3033.2020.01.020
[11]   崔浩, 张新宇, 王警, 等 自主船舶与有人驾驶船舶动态博弈避碰决策[J]. 中国舰船研究, 2024, 19 (1): 238- 247
CUI Hao, ZHANG Xinyu, WANG Jing, et al Dynamic game collision avoidance decision-making for autonomous and manned ships[J]. Chinese Journal of Ship Research, 2024, 19 (1): 238- 247
[12]   ZHANG X, WANG C, LIU Y, et al Decision-making for the autonomous navigation of maritime autonomous surface ships based on scene division and deep reinforcement learning[J]. Sensor, 2019, 19 (18): 4055
doi: 10.3390/s19184055
[13]   黄仁贤, 罗亮 基于多智能体深度强化学习的多船协同避碰策略[J]. 计算机集成制造系统, 2024, 30 (6): 1972- 1988
HUANG Renxian, LUO Liang Multi-ship collaborative collision avoidance strategy based on multi-agent deep reinforcement learning[J]. Computer Integrated Manufacturing Systems, 2024, 30 (6): 1972- 1988
doi: 10.13196/j.cims.2023.0382
[14]   WANG Z, CHEN P, CHEN L, et al Collaborative collision avoidance approach for USVs based on multi-agent deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26 (4): 4780- 4794
doi: 10.1109/TITS.2025.3547775
[15]   Marine Safety Investigation Unit. Marine safety investigation report [EB/OL]. (2020−03−18)[2025−05−11]. https://www.marfag.no/k52/media/mt-aseem-final-safety-investigation-report.pdf.
[16]   刘立群, 吴超仲, 褚端峰, 等 基于Vondrak滤波和三次样条插值的船舶轨迹修复研究[J]. 交通信息与安全, 2015, 33 (4): 100- 105
LIU Liqun, WU Chaozhong, CHU Duanfeng, et al A study of ship trajectory restoration based on Vondrak filtering and cubic spline interpolation[J]. Journal of Transport Information and Safety, 2015, 33 (4): 100- 105
doi: 10.3963/j.issn1674-4861.2015.04.016
[17]   WANG Y, YE Q, LAU H, et al Nash bargaining strategy in autonomous decision making for multi-ship collision avoidance based on route exchange[J]. IET Intelligent Transport Systems, 2025, 19 (1): e70025
doi: 10.1049/itr2.70025
[18]   WANG Y, ZHANG J, CHEN X, et al A spatial-temporal forensic analysis for inland-water ship collisions using AIS data[J]. Safety Science, 2013, 57: 187- 202
[19]   ZHANG K, HUANG L, HE Y, et al A real-time multi-ship collision avoidance decision-making system for autonomous ships considering ship motion uncertainty[J]. Ocean Engineering, 2023, 278: 114205
doi: 10.1016/j.oceaneng.2023.114205
[20]   LI G, ZHANG X Research on the influence of wind, waves, and tidal current on ship turning ability based on Norrbin model[J]. Ocean Engineering, 2022, 259: 111875
doi: 10.1016/j.oceaneng.2022.111875
[21]   符小卫, 王辉, 徐哲 基于DE-MADDPG的多无人机协同追捕策略[J]. 宇航学报, 2022, 43 (5): 325311
FU Xiaowei, WANG Hui, XU Ze Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43 (5): 325311
[22]   WANG N An intelligent spatial collision risk based on the quaternion ship domain[J]. The Journal of Navigation, 2010, 63: 733- 749
doi: 10.1017/S0373463310000202
[23]   Sea Traffic Management. Route exchange ship-ship [EB/OL]. (2015−12−10)[2025−05−11]. https://stm-stmvalidation.s3.eu-west-1.amazonaws.com/uploads/20160420153149/Draft-description-of-test-bed-services-and-information-needs_2015-12-10.pdf.
[1] Jinchi JIAO,Jian SUN,Xunyou NI. Dynamic game and carbon emission effects between urban ride-sourcing and cruise taxis[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1373-1384.
[2] Yulong DONG,Lu CHEN,Zhongkai BAO. Resource allocation of aircraft final assembly logistics distribution system based on game theory[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 120-129.
[3] Junhui ZHANG,Xiaoman GUO,Jingxian WANG,Zongjie FU,Yuxi LIU. Shared lane-keeping control based on non-cooperative game theory[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 1001-1008.
[4] PAN Ju-long, LI Shan-ping, ZHANG Dao-yuan. Detecting suspicious node within one cluster in wireless sensor network
using game theoretic approach
[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(1): 72-78.
[5] TAN Hai-Yun, HONG Yuan-Rui, XIE Dun, et al. Game analysis of electricity markets considering bidding of ramp-rate[J]. Journal of ZheJiang University (Engineering Science), 2009, 43(6): 1152-1157.