Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (9): 1996-2004    DOI: 10.3785/j.issn.1008-973X.2025.09.023
交通工程     
基于可达集和强化学习的智能汽车决策规划
高洪伟1(),尚秉旭1,张鑫康2,王洪峰1,何维2,裴晓飞2,*()
1. 中国第一汽车集团有限公司研发总院,长春 130011
2. 武汉理工大学 汽车工程学院,湖北 武汉 430070
Decision-making and planning of intelligent vehicle based on reachable set and reinforcement learning
Hongwei GAO1(),Bingxu SHANG1,Xinkang ZHANG2,Hongfeng WANG1,Wei HE2,Xiaofei PEI2,*()
1. R&D Center, China FAW Group Corporation, Changchun 130011, China
2. School of Automotive Engineering, Wuhan University of Technology, Wuhan 430070, China
 全文: PDF(1946 KB)   HTML
摘要:

针对传统可达集无法有效应对动态不确定场景下智能汽车与旁车之间的行为交互,且计算量过大的问题,提出结合可达集与强化学习的决策规划算法. 算法框架引入强化学习模型进行多步决策引导,明确规划时域内的连续宏观驾驶行为. 建立强化学习决策模型并进行马尔科夫决策过程(MDP)建模,设计状态空间、动作空间和奖励函数. 基于驾驶语义进行可行驶区域分割,引入横纵向行为谓词,通过先横向后纵向的二次分割将各时刻可达区域按照驾驶行为分割为有限个可行驶区域. 通过各时刻强化学习模型输出的动作推算自车位置确定最优行驶区域,形成驾驶走廊. 通过动态不确定场景下的长时循环测试统计和典型场景分析对比,验证所提出算法的有效性. 实验结果表明,与现有的可达集算法相比,所提算法在行驶效率、安全性、舒适性和实时性等方面综合性能更优.

关键词: 智能汽车轨迹规划可达集强化学习驾驶走廊    
Abstract:

A decision-making and planning algorithm integrating reachable sets with reinforcement learning (RL) was proposed to address the limitations of traditional reachable sets in effectively handling behavioral interactions between intelligent vehicles and adjacent vehicles in dynamic and uncertain environments, as well as excessive computational complexity. An RL model was incorporated into the algorithm framework to guide multi-step decision-making, clearly defining continuous macro driving behaviors over the planning horizon. Firstly, a reinforcement learning decision model was established and formulated as a Markov decision process (MDP), with state space, action space, and reward function designed. Secondly, feasible driving regions were partitioned based on driving semantics. Lateral and longitudinal behavioral predicates were introduced to segment reachable regions at each time step into finite feasible areas via a two-stage (lateral-first, then longitudinal) segmentation. Finally, the ego vehicle’s position was predicted from RL model outputs to determine optimal driving regions and form a driving corridor. The proposed algorithm’s effectiveness was validated through long-duration cyclic tests in dynamic and uncertain scenarios and comparative analysis of typical cases. Experimental results demonstrated that, compared with existing reachable set algorithms, the proposed method achieved better overall performance in enhancing driving efficiency and ensuring safety, comfort, and real-time responsiveness.

Key words: intelligent vehicle    trajectory planning    reachable set    reinforcement learning    driving corridor
收稿日期: 2024-08-27 出版日期: 2025-08-25
CLC:  U 467  
基金资助: 国家自然科学基金资助项目(52272426).
通讯作者: 裴晓飞     E-mail: gaohongwei@faw.com.cn;peixiaofei7@163.com
作者简介: 高洪伟(1982—),男,高级工程师,博士,从事智能汽车技术研究. orcid.org/0009-0007-7326-7143. E-mail:gaohongwei@faw.com.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
高洪伟
尚秉旭
张鑫康
王洪峰
何维
裴晓飞

引用本文:

高洪伟,尚秉旭,张鑫康,王洪峰,何维,裴晓飞. 基于可达集和强化学习的智能汽车决策规划[J]. 浙江大学学报(工学版), 2025, 59(9): 1996-2004.

Hongwei GAO,Bingxu SHANG,Xinkang ZHANG,Hongfeng WANG,Wei HE,Xiaofei PEI. Decision-making and planning of intelligent vehicle based on reachable set and reinforcement learning. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1996-2004.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.09.023        https://www.zjujournals.com/eng/CN/Y2025/V59/I9/1996

图 1  强化学习决策引导下的可达集规划架构
决策出的驾驶行为需搜索谓词
${a_{{j}}}$${{\mathrm{if}}}{\mkern 1mu} {\mkern 1mu} {L_1}{\mkern 1mu} {\mkern 1mu} {\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},\left\{ {{L_1}} \right\})$
${\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},\left\{ {{L_0}} \right\})$
${{\mathrm{if}}}{\mkern 1mu} {\mkern 1mu} {L_{ - 1}}{\mkern 1mu} {\mkern 1mu} {\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},\left\{ {{L_{ - 1}}} \right\})$
${\mathrm{Lcf}}$$\begin{array}{*{20}{c}} {{\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},({L_1},{L_0}))} \\ {{\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},({L_1}))} \\ {{\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},({L_0}))} \end{array}$
${\mathrm{Lcr}}$$\begin{array}{*{20}{c}} {{\mathrm{{lnlanelets}}}({\rm{Ve}}{{\rm{h}}_0},\left\{ {{L_0},{L_1}} \right\})} \\ {{\mathrm{{Inlanelets}}}({\rm{Ve}}{{\rm{h}}_0},({L_{ - 1}}))} \\ {{\mathrm{{Inlanelets}}}({\rm{Ve}}{{\rm{h}}_0},({L_0}))} \end{array}$
表 1  横向需搜索的谓词
图 2  横向语义分割结果示意图
图 3  最终分割区域示意图
图 4  驾驶走廊生成流程
图 5  换道通道示意图
图 6  动态不确定测试场景
参数随机变量
1)注:U表示随机变量服从区间( )上的均匀分布.
车辆初始位置(前)/m$\begin{array}{*{20}{l}} {{{U}}(10,30)({\rm{checkpointA}})}^{1)}/ \\ {{{U}}(470,500)({\rm{checkpointC}})} \end{array}$
车辆初始位置(中)/m$\begin{gathered} {{U}}(40,75)({\rm{checkpointA}})/ \\ {{U}}(510,545)({\rm{checkpointC}}) \\ \end{gathered} $
车辆初始位置(后)/m$\begin{gathered} {{U}}(90,120)({\rm{checkpointA}})/ \\ {{U}}(560,590)({\rm{checkpointC}}) \\ \end{gathered} $
静止车辆初始位置/m$ \begin{gathered} {{U(120,320)(}}{\rm{checkpointA}}{\text{)}}/ \\ {{U(590,760)(}}{\rm{checkpointC}}{\text{)}} \\ \end{gathered} $
静止车辆压线概率/%$ 25$
车辆初始车速/(m·s?1)$U(6,15)$
表 2  测试场景随机变量分布参数
参数名称描述参数值
隐藏层参数各层神经元数(256,128)
折扣系数计算长期折扣奖励0.99
探索系数ε-贪心算法1.0$\geqslant $0.02
学习率衰减率系数学习率减小比例0.8
最小学习率学习率的最小值0.00001
学习率衰减步每隔一定训练步长减小学习率20000
激活函数增加神经网络的非线性Relu
损失函数计算拟合误差传播梯度Huber-Loss
批量大小单次训练抽取的样本数32
软更新速率目标网络更替系数0.01
经验池尺寸存储训练样本100000
梯度截断梯度传播最大值10
网络优化器梯度下降算法Adam
表 3  强化学习模型的主要超参数
图 7  DDQN模型平均奖励训练结果
图 8  DDQN模型平均速度训练结果
方法$\bar v/({\mathrm{m}} \cdot {\mathrm{s}}^{-1})$$ {n}_{\text{d}}/次 $$ {a}_{\text{l},{\mathrm{RMS}}}/({\mathrm{m}}\cdot {\mathrm{s}}^{-2}) $${t_{\rm{s}}}/{\mathrm{m}} {\mathrm{s}}$
MOBIL+IDM[20-21]8.8233.62.398 010
传统可达集[22]9.2811.52.8405286
基于动态规划的可达集[8]10.2180.91.6986860
基于强化学习的可达集11.3361.01.5896280
表 4  动态不确定场景统计对比结果
图 9  典型场景示意图
图 10  典型场景下的轨迹和速度结果
图 11  典型场景下自车与前向车距结果
图 12  典型场景下第21 s时的驾驶走廊
1 朱冰, 贾士政, 赵健, 等 自动驾驶车辆决策与规划研究综述[J]. 中国公路学报, 2024, 37 (1): 215- 240
ZHU Bing, JIA Shizheng, ZHAO Jian, et al Review of research on decision-making and planning for automated vehicles[J]. China Journal of Highway and Transport, 2024, 37 (1): 215- 240
2 NÉMETH B, GÁSPÁR P Hierarchical motion control strategies for handling interactions of automated vehicles[J]. Control Engineering Practice, 2023, 136: 105523
doi: 10.1016/j.conengprac.2023.105523
3 XIONG L, ZHANG Y, LIU Y, et al Integrated decision making and planning based on feasible region construction for autonomous vehicles considering prediction uncertainty[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8 (11): 4515- 4523
doi: 10.1109/TIV.2023.3299845
4 XIN L, KONG Y, LI S E, et al Enable faster and smoother spatio-temporal trajectory planning for autonomous vehicles in constrained dynamic environment[J]. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, 2021, 235 (4): 1101- 1112
doi: 10.1177/0954407020906627
5 MARTINEZ ROCAMORA B, PEREIRA G A S Parallel sensor-space lattice planner for real-time obstacle avoidance[J]. Sensors, 2022, 22 (13): 4770
doi: 10.3390/s22134770
6 MANZINGER S, PEK C, ALTHOFF M Using reachable sets for trajectory planning of automated vehicles[J]. IEEE Transactions on Intelligent Vehicles, 2021, 6 (2): 232- 248
doi: 10.1109/TIV.2020.3017342
7 HANG P, LV C, HUANG C, et al An integrated framework of decision making and motion planning for autonomous vehicles considering social behaviors[J]. IEEE Transactions on Vehicular Technology, 2020, 69 (12): 14458- 14469
doi: 10.1109/TVT.2020.3040398
8 ZHANG X, YANG B, PEI X, et al Trajectory planning based on spatio-temporal reachable set considering dynamic probabilistic risk[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106291
doi: 10.1016/j.engappai.2023.106291
9 SÖNTGES S, ALTHOFF M Computing the drivable area of autonomous road vehicles in dynamic road scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19 (6): 1855- 1866
doi: 10.1109/TITS.2017.2742141
10 MASCETTA T, LIU E I, ALTHOFF M. Rule-compliant multi-agent driving corridor generation using reachable sets and combinatorial negotiations [C]// Proceedings of the IEEE Intelligent Vehicles Symposium. Jeju Island: IEEE, 2024: 1417–1423.
11 LERCHER F, ALTHOFF M. Specification-compliant reachability analysis for autonomous vehicles using on-the-fly model checking [C]// Proceedings of the IEEE Intelligent Vehicles Symposium. Jeju Island: IEEE, 2024: 1484–1491.
12 ZHU Z, ZHAO H A survey of deep RL and IL for autonomous driving policy learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (9): 14043- 14065
doi: 10.1109/TITS.2021.3134702
13 DUAN J, EBEN LI S, GUAN Y, et al Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data[J]. IET Intelligent Transport Systems, 2020, 14 (5): 297- 305
doi: 10.1049/iet-its.2019.0317
14 TRAUTH R, HOBMEIER A, BETZ J. A reinforcement learning-boosted motion planning framework: comprehensive generalization performance in autonomous driving [EB/OL]. (2024-02-02)[2025-06-16]. https://arxiv.org/abs/2402.01465v1.
15 YU J, ARAB A, YI J, et al Hierarchical framework integrating rapidly-exploring random tree with deep reinforcement learning for autonomous vehicle[J]. Applied Intelligence, 2023, 53 (13): 16473- 16486
doi: 10.1007/s10489-022-04358-7
16 JAFARI R, ASHARI A E, HUBER M. CHAMP: integrated logic with reinforcement learning for hybrid decision making for autonomous vehicle planning [C]// Proceedings of the American Control Conference. San Diego: IEEE, 2023: 3310–3315.
17 CHEN D, JIANG L, WANG Y, et al. Autonomous driving using safe reinforcement learning by incorporating a regret-based human lane-changing decision model [C]// Proceedings of the American Control Conference. Denver: IEEE, 2020: 4355–4361.
18 ZHOU H, PEI X, LIU Y, et al. Trajectory planning for autonomous vehicles at urban intersections based on reachable sets [C]// IEEE Intelligent Vehicle Symposium. Cluj Napoca: IEEE, 2025: 1101–1107.
19 李国法, 陈耀昱, 吕辰, 等 智能汽车决策中的驾驶行为语义解析关键技术[J]. 汽车安全与节能学报, 2019, 10 (4): 391- 412
LI Guofa, CHEN Yaoyu, LV Chen, et al Key techniques of semantic analysis of driving behavior in decision making of autonomous vehicles[J]. Journal of Automotive Safety and Energy, 2019, 10 (4): 391- 412
doi: 10.3969/j.issn.1674-8484.2019.04.001
20 QIAN L, XU X, ZENG Y, et al Synchronous maneuver searching and trajectory planning for autonomous vehicles in dynamic traffic environments[J]. IEEE Intelligent Transportation Systems Magazine, 2022, 14 (1): 57- 73
doi: 10.1109/MITS.2019.2953551
21 TREIBER M, HENNECKE A, HELBING D Congested traffic states in empirical observations and microscopic simulations[J]. Physical Review E, Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 2000, 62 (2A): 1805- 1824
22 周兴珍, 裴晓飞, 张鑫康 基于可达集优化的智能汽车轨迹规划研究[J]. 武汉理工大学学报, 2022, 44 (6): 39- 48
ZHOU Xingzhen, PEI Xiaofei, ZHANG Xinkang Trajectory planning of intelligent vehicle based on reachable set and optimization[J]. Journal of Wuhan University of Technology, 2022, 44 (6): 39- 48
doi: 10.3963/j.issn.1671-4431.2022.06.007
[1] 柳佳乐,薛雅丽,崔闪,洪君. 动态窗口法引导的TD3无地图导航算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1671-1679.
[2] 郝琨,孟璇,赵晓芳,李志圣. 融合自适应势场法和深度强化学习的三维水下AUV路径规划方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1451-1461.
[3] 赵威,张万枝,侯加林,侯瑞,李玉华,赵乐俊,程进. 基于改进深度强化学习算法的农业机器人路径规划[J]. 浙江大学学报(工学版), 2025, 59(7): 1492-1503.
[4] 李颂元,朱祥维,李玺. 基座模型技术背景下的具身智能体综述[J]. 浙江大学学报(工学版), 2025, 59(2): 213-226.
[5] 张名芳,马健,赵娜乐,王力,刘颖. 无信号交叉口处基于深度强化学习的智能网联车辆运动规划[J]. 浙江大学学报(工学版), 2024, 58(9): 1923-1934.
[6] 叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.
[7] 王卓,李永强,冯宇,冯远静. 两方零和马尔科夫博弈策略梯度算法及收敛性分析[J]. 浙江大学学报(工学版), 2024, 58(3): 480-491.
[8] 王义娜,曹晨,杨佳琪,俞彦军,傅国强,王硕玉. 考虑个体习惯的轮椅机器人人机共享避障方法[J]. 浙江大学学报(工学版), 2024, 58(11): 2299-2308.
[9] 薛雅丽,叶金泽,李寒雁. 基于改进强化学习的多智能体追逃对抗[J]. 浙江大学学报(工学版), 2023, 57(8): 1479-1486.
[10] 仲重亮,刘云峰,朱伟东,朱赴东. 面向口腔种植的机器人多姿态轨迹平滑规划[J]. 浙江大学学报(工学版), 2023, 57(5): 1030-1037.
[11] 谭伟,刘景升,祖晖,全洪乾. 参数不确定和扰动下智能汽车路径跟踪控制[J]. 浙江大学学报(工学版), 2023, 57(4): 702-711.
[12] 徐少铭,李钰,袁晴龙. 基于强化学习和3σ准则的组合剪枝方法[J]. 浙江大学学报(工学版), 2023, 57(3): 486-494.
[13] 张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[14] 郭万金,赵伍端,利乾辉,赵立军,曹雏清. 基于集成概率模型的变阻抗机器人打磨力控制[J]. 浙江大学学报(工学版), 2023, 57(12): 2356-2366.
[15] 姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.