Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (9): 1996-2004    DOI: 10.3785/j.issn.1008-973X.2025.09.023
    
Decision-making and planning of intelligent vehicle based on reachable set and reinforcement learning
Hongwei GAO1(),Bingxu SHANG1,Xinkang ZHANG2,Hongfeng WANG1,Wei HE2,Xiaofei PEI2,*()
1. R&D Center, China FAW Group Corporation, Changchun 130011, China
2. School of Automotive Engineering, Wuhan University of Technology, Wuhan 430070, China
Download: HTML     PDF(1946KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A decision-making and planning algorithm integrating reachable sets with reinforcement learning (RL) was proposed to address the limitations of traditional reachable sets in effectively handling behavioral interactions between intelligent vehicles and adjacent vehicles in dynamic and uncertain environments, as well as excessive computational complexity. An RL model was incorporated into the algorithm framework to guide multi-step decision-making, clearly defining continuous macro driving behaviors over the planning horizon. Firstly, a reinforcement learning decision model was established and formulated as a Markov decision process (MDP), with state space, action space, and reward function designed. Secondly, feasible driving regions were partitioned based on driving semantics. Lateral and longitudinal behavioral predicates were introduced to segment reachable regions at each time step into finite feasible areas via a two-stage (lateral-first, then longitudinal) segmentation. Finally, the ego vehicle’s position was predicted from RL model outputs to determine optimal driving regions and form a driving corridor. The proposed algorithm’s effectiveness was validated through long-duration cyclic tests in dynamic and uncertain scenarios and comparative analysis of typical cases. Experimental results demonstrated that, compared with existing reachable set algorithms, the proposed method achieved better overall performance in enhancing driving efficiency and ensuring safety, comfort, and real-time responsiveness.



Key wordsintelligent vehicle      trajectory planning      reachable set      reinforcement learning      driving corridor     
Received: 27 August 2024      Published: 25 August 2025
CLC:  U 467  
Fund:  国家自然科学基金资助项目(52272426).
Corresponding Authors: Xiaofei PEI     E-mail: gaohongwei@faw.com.cn;peixiaofei7@163.com
Cite this article:

Hongwei GAO,Bingxu SHANG,Xinkang ZHANG,Hongfeng WANG,Wei HE,Xiaofei PEI. Decision-making and planning of intelligent vehicle based on reachable set and reinforcement learning. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1996-2004.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.09.023     OR     https://www.zjujournals.com/eng/Y2025/V59/I9/1996


基于可达集和强化学习的智能汽车决策规划

针对传统可达集无法有效应对动态不确定场景下智能汽车与旁车之间的行为交互,且计算量过大的问题,提出结合可达集与强化学习的决策规划算法. 算法框架引入强化学习模型进行多步决策引导,明确规划时域内的连续宏观驾驶行为. 建立强化学习决策模型并进行马尔科夫决策过程(MDP)建模,设计状态空间、动作空间和奖励函数. 基于驾驶语义进行可行驶区域分割,引入横纵向行为谓词,通过先横向后纵向的二次分割将各时刻可达区域按照驾驶行为分割为有限个可行驶区域. 通过各时刻强化学习模型输出的动作推算自车位置确定最优行驶区域,形成驾驶走廊. 通过动态不确定场景下的长时循环测试统计和典型场景分析对比,验证所提出算法的有效性. 实验结果表明,与现有的可达集算法相比,所提算法在行驶效率、安全性、舒适性和实时性等方面综合性能更优.


关键词: 智能汽车,  轨迹规划,  可达集,  强化学习,  驾驶走廊 
Fig.1 Reachable set programming architecture guided by reinforcement learning decision
决策出的驾驶行为需搜索谓词
${a_{{j}}}$${{\mathrm{if}}}{\mkern 1mu} {\mkern 1mu} {L_1}{\mkern 1mu} {\mkern 1mu} {\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},\left\{ {{L_1}} \right\})$
${\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},\left\{ {{L_0}} \right\})$
${{\mathrm{if}}}{\mkern 1mu} {\mkern 1mu} {L_{ - 1}}{\mkern 1mu} {\mkern 1mu} {\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},\left\{ {{L_{ - 1}}} \right\})$
${\mathrm{Lcf}}$$\begin{array}{*{20}{c}} {{\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},({L_1},{L_0}))} \\ {{\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},({L_1}))} \\ {{\rm{Inlanelets}}({\rm{Ve}}{{\rm{h}}_0},({L_0}))} \end{array}$
${\mathrm{Lcr}}$$\begin{array}{*{20}{c}} {{\mathrm{{lnlanelets}}}({\rm{Ve}}{{\rm{h}}_0},\left\{ {{L_0},{L_1}} \right\})} \\ {{\mathrm{{Inlanelets}}}({\rm{Ve}}{{\rm{h}}_0},({L_{ - 1}}))} \\ {{\mathrm{{Inlanelets}}}({\rm{Ve}}{{\rm{h}}_0},({L_0}))} \end{array}$
Tab.1 Required horizontal search predicates
Fig.2 Schematic diagram of horizontal semantic segmentation results
Fig.3 Final partition area diagram
Fig.4 Drive corridor generation process
Fig.5 Change channel diagram
Fig.6 Dynamic uncertain test scenario
参数随机变量
1)注:U表示随机变量服从区间( )上的均匀分布.
车辆初始位置(前)/m$\begin{array}{*{20}{l}} {{{U}}(10,30)({\rm{checkpointA}})}^{1)}/ \\ {{{U}}(470,500)({\rm{checkpointC}})} \end{array}$
车辆初始位置(中)/m$\begin{gathered} {{U}}(40,75)({\rm{checkpointA}})/ \\ {{U}}(510,545)({\rm{checkpointC}}) \\ \end{gathered} $
车辆初始位置(后)/m$\begin{gathered} {{U}}(90,120)({\rm{checkpointA}})/ \\ {{U}}(560,590)({\rm{checkpointC}}) \\ \end{gathered} $
静止车辆初始位置/m$ \begin{gathered} {{U(120,320)(}}{\rm{checkpointA}}{\text{)}}/ \\ {{U(590,760)(}}{\rm{checkpointC}}{\text{)}} \\ \end{gathered} $
静止车辆压线概率/%$ 25$
车辆初始车速/(m·s?1)$U(6,15)$
Tab.2 Random variable distribution parameters in test scenario
参数名称描述参数值
隐藏层参数各层神经元数(256,128)
折扣系数计算长期折扣奖励0.99
探索系数ε-贪心算法1.0$\geqslant $0.02
学习率衰减率系数学习率减小比例0.8
最小学习率学习率的最小值0.00001
学习率衰减步每隔一定训练步长减小学习率20000
激活函数增加神经网络的非线性Relu
损失函数计算拟合误差传播梯度Huber-Loss
批量大小单次训练抽取的样本数32
软更新速率目标网络更替系数0.01
经验池尺寸存储训练样本100000
梯度截断梯度传播最大值10
网络优化器梯度下降算法Adam
Tab.3 Main hyperparameters of RL models
Fig.7 Average reward training results of DDQN model
Fig.8 Average speed training results of DDQN model
方法$\bar v/({\mathrm{m}} \cdot {\mathrm{s}}^{-1})$$ {n}_{\text{d}}/次 $$ {a}_{\text{l},{\mathrm{RMS}}}/({\mathrm{m}}\cdot {\mathrm{s}}^{-2}) $${t_{\rm{s}}}/{\mathrm{m}} {\mathrm{s}}$
MOBIL+IDM[20-21]8.8233.62.398 010
传统可达集[22]9.2811.52.8405286
基于动态规划的可达集[8]10.2180.91.6986860
基于强化学习的可达集11.3361.01.5896280
Tab.4 Statistical comparison results of dynamic uncertain scenarios
Fig.9 Typical scene diagram
Fig.10 Trajectory and velocity results under a typical scenario
Fig.11 Results of forward vehicle distance from vehicle in a typical scenario
Fig.12 Driving corridor at 21 seconds in a typical scenario
[1]   朱冰, 贾士政, 赵健, 等 自动驾驶车辆决策与规划研究综述[J]. 中国公路学报, 2024, 37 (1): 215- 240
ZHU Bing, JIA Shizheng, ZHAO Jian, et al Review of research on decision-making and planning for automated vehicles[J]. China Journal of Highway and Transport, 2024, 37 (1): 215- 240
[2]   NÉMETH B, GÁSPÁR P Hierarchical motion control strategies for handling interactions of automated vehicles[J]. Control Engineering Practice, 2023, 136: 105523
doi: 10.1016/j.conengprac.2023.105523
[3]   XIONG L, ZHANG Y, LIU Y, et al Integrated decision making and planning based on feasible region construction for autonomous vehicles considering prediction uncertainty[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8 (11): 4515- 4523
doi: 10.1109/TIV.2023.3299845
[4]   XIN L, KONG Y, LI S E, et al Enable faster and smoother spatio-temporal trajectory planning for autonomous vehicles in constrained dynamic environment[J]. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, 2021, 235 (4): 1101- 1112
doi: 10.1177/0954407020906627
[5]   MARTINEZ ROCAMORA B, PEREIRA G A S Parallel sensor-space lattice planner for real-time obstacle avoidance[J]. Sensors, 2022, 22 (13): 4770
doi: 10.3390/s22134770
[6]   MANZINGER S, PEK C, ALTHOFF M Using reachable sets for trajectory planning of automated vehicles[J]. IEEE Transactions on Intelligent Vehicles, 2021, 6 (2): 232- 248
doi: 10.1109/TIV.2020.3017342
[7]   HANG P, LV C, HUANG C, et al An integrated framework of decision making and motion planning for autonomous vehicles considering social behaviors[J]. IEEE Transactions on Vehicular Technology, 2020, 69 (12): 14458- 14469
doi: 10.1109/TVT.2020.3040398
[8]   ZHANG X, YANG B, PEI X, et al Trajectory planning based on spatio-temporal reachable set considering dynamic probabilistic risk[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106291
doi: 10.1016/j.engappai.2023.106291
[9]   SÖNTGES S, ALTHOFF M Computing the drivable area of autonomous road vehicles in dynamic road scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19 (6): 1855- 1866
doi: 10.1109/TITS.2017.2742141
[10]   MASCETTA T, LIU E I, ALTHOFF M. Rule-compliant multi-agent driving corridor generation using reachable sets and combinatorial negotiations [C]// Proceedings of the IEEE Intelligent Vehicles Symposium. Jeju Island: IEEE, 2024: 1417–1423.
[11]   LERCHER F, ALTHOFF M. Specification-compliant reachability analysis for autonomous vehicles using on-the-fly model checking [C]// Proceedings of the IEEE Intelligent Vehicles Symposium. Jeju Island: IEEE, 2024: 1484–1491.
[12]   ZHU Z, ZHAO H A survey of deep RL and IL for autonomous driving policy learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (9): 14043- 14065
doi: 10.1109/TITS.2021.3134702
[13]   DUAN J, EBEN LI S, GUAN Y, et al Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data[J]. IET Intelligent Transport Systems, 2020, 14 (5): 297- 305
doi: 10.1049/iet-its.2019.0317
[14]   TRAUTH R, HOBMEIER A, BETZ J. A reinforcement learning-boosted motion planning framework: comprehensive generalization performance in autonomous driving [EB/OL]. (2024-02-02)[2025-06-16]. https://arxiv.org/abs/2402.01465v1.
[15]   YU J, ARAB A, YI J, et al Hierarchical framework integrating rapidly-exploring random tree with deep reinforcement learning for autonomous vehicle[J]. Applied Intelligence, 2023, 53 (13): 16473- 16486
doi: 10.1007/s10489-022-04358-7
[16]   JAFARI R, ASHARI A E, HUBER M. CHAMP: integrated logic with reinforcement learning for hybrid decision making for autonomous vehicle planning [C]// Proceedings of the American Control Conference. San Diego: IEEE, 2023: 3310–3315.
[17]   CHEN D, JIANG L, WANG Y, et al. Autonomous driving using safe reinforcement learning by incorporating a regret-based human lane-changing decision model [C]// Proceedings of the American Control Conference. Denver: IEEE, 2020: 4355–4361.
[18]   ZHOU H, PEI X, LIU Y, et al. Trajectory planning for autonomous vehicles at urban intersections based on reachable sets [C]// IEEE Intelligent Vehicle Symposium. Cluj Napoca: IEEE, 2025: 1101–1107.
[19]   李国法, 陈耀昱, 吕辰, 等 智能汽车决策中的驾驶行为语义解析关键技术[J]. 汽车安全与节能学报, 2019, 10 (4): 391- 412
LI Guofa, CHEN Yaoyu, LV Chen, et al Key techniques of semantic analysis of driving behavior in decision making of autonomous vehicles[J]. Journal of Automotive Safety and Energy, 2019, 10 (4): 391- 412
doi: 10.3969/j.issn.1674-8484.2019.04.001
[20]   QIAN L, XU X, ZENG Y, et al Synchronous maneuver searching and trajectory planning for autonomous vehicles in dynamic traffic environments[J]. IEEE Intelligent Transportation Systems Magazine, 2022, 14 (1): 57- 73
doi: 10.1109/MITS.2019.2953551
[21]   TREIBER M, HENNECKE A, HELBING D Congested traffic states in empirical observations and microscopic simulations[J]. Physical Review E, Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 2000, 62 (2A): 1805- 1824
[22]   周兴珍, 裴晓飞, 张鑫康 基于可达集优化的智能汽车轨迹规划研究[J]. 武汉理工大学学报, 2022, 44 (6): 39- 48
ZHOU Xingzhen, PEI Xiaofei, ZHANG Xinkang Trajectory planning of intelligent vehicle based on reachable set and optimization[J]. Journal of Wuhan University of Technology, 2022, 44 (6): 39- 48
doi: 10.3963/j.issn.1671-4431.2022.06.007
[1] Jiale LIU,Yali XUE,Shan CUI,Jun HONG. TD3 mapless navigation algorithm guided by dynamic window approach[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1671-1679.
[2] Kun HAO,Xuan MENG,Xiaofang ZHAO,Zhisheng LI. 3D underwater AUV path planning method integrating adaptive potential field method and deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1451-1461.
[3] Wei ZHAO,Wanzhi ZHANG,Jialin HOU,Rui HOU,Yuhua LI,Lejun ZHAO,Jin Cheng. Path planning of agricultural robots based on improved deep reinforcement learning algorithm[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1492-1503.
[4] Songyuan LI,Xiangwei ZHU,Xi LI. Survey of embodied agent in context of foundation model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 213-226.
[5] Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU. Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1923-1934.
[6] Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO. Traffic signal control method based on asynchronous advantage actor-critic[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1671-1680.
[7] Junhui ZHANG,Xiaoman GUO,Jingxian WANG,Zongjie FU,Yuxi LIU. Shared lane-keeping control based on non-cooperative game theory[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 1001-1008.
[8] Zhuo WANG,Yongqiang LI,Yu FENG,Yuanjing FENG. Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 480-491.
[9] Yina WANG,Chen CAO,Jiaqi YANG,Yanjun YU,Guoqiang FU,Shuoyu WANG. Human-machine shared obstacle avoidance method for wheelchair robot considering individual habit[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(11): 2299-2308.
[10] Ya-li XUE,Jin-ze YE,Han-yan LI. Multi-agent pursuit and evasion games based on improved reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1479-1486.
[11] Chong-liang ZHONG,Yun-feng LIU,Wei-dong ZHU,Fu-dong ZHU. Multi-orientation trajectory smoothing planning of robot for dental implant[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 1030-1037.
[12] Wei TAN,Jing-sheng LIU,Hui ZU,Hong-qian QUAN. Intelligent vehicle path tracking control under parametric uncertainties and external disturbances[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 702-711.
[13] Shao-ming XU,Yu LI,Qing-long YUAN. Combination pruning method based on reinforcement learning and 3σ criterion[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 486-494.
[14] Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.
[15] Jun-hui ZHANG,Xiao-man GUO,Jing-xian WANG,Zong-jie FU,Da-peng CHEN. Safety-enhanced multi-vehicle tracking based on joint probability data association[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2170-2178.