基于可达集和强化学习的智能汽车决策规划

doi:10.3785/j.issn.1008-973X.2025.09.023

浙江大学学报(工学版)

2025, Vol. 59

Issue (9): 1996-2004 DOI: 10.3785/j.issn.1008-973X.2025.09.023

交通工程

基于可达集和强化学习的智能汽车决策规划

高洪伟1(

),尚秉旭1,张鑫康2,王洪峰1,何维2,裴晓飞2,*(

)

1. 中国第一汽车集团有限公司研发总院，长春 130011
2. 武汉理工大学汽车工程学院，湖北武汉 430070

Decision-making and planning of intelligent vehicle based on reachable set and reinforcement learning

Hongwei GAO1(

),Bingxu SHANG1,Xinkang ZHANG2,Hongfeng WANG1,Wei HE2,Xiaofei PEI2,*(

)

1. R&D Center, China FAW Group Corporation, Changchun 130011, China
2. School of Automotive Engineering, Wuhan University of Technology, Wuhan 430070, China

全文: PDF(1946 KB) HTML

摘要：

针对传统可达集无法有效应对动态不确定场景下智能汽车与旁车之间的行为交互，且计算量过大的问题，提出结合可达集与强化学习的决策规划算法. 算法框架引入强化学习模型进行多步决策引导，明确规划时域内的连续宏观驾驶行为. 建立强化学习决策模型并进行马尔科夫决策过程(MDP)建模，设计状态空间、动作空间和奖励函数. 基于驾驶语义进行可行驶区域分割，引入横纵向行为谓词，通过先横向后纵向的二次分割将各时刻可达区域按照驾驶行为分割为有限个可行驶区域. 通过各时刻强化学习模型输出的动作推算自车位置确定最优行驶区域，形成驾驶走廊. 通过动态不确定场景下的长时循环测试统计和典型场景分析对比，验证所提出算法的有效性. 实验结果表明，与现有的可达集算法相比，所提算法在行驶效率、安全性、舒适性和实时性等方面综合性能更优.

关键词： 智能汽车; 轨迹规划; 可达集; 强化学习; 驾驶走廊

Abstract:

A decision-making and planning algorithm integrating reachable sets with reinforcement learning (RL) was proposed to address the limitations of traditional reachable sets in effectively handling behavioral interactions between intelligent vehicles and adjacent vehicles in dynamic and uncertain environments, as well as excessive computational complexity. An RL model was incorporated into the algorithm framework to guide multi-step decision-making, clearly defining continuous macro driving behaviors over the planning horizon. Firstly, a reinforcement learning decision model was established and formulated as a Markov decision process (MDP), with state space, action space, and reward function designed. Secondly, feasible driving regions were partitioned based on driving semantics. Lateral and longitudinal behavioral predicates were introduced to segment reachable regions at each time step into finite feasible areas via a two-stage (lateral-first, then longitudinal) segmentation. Finally, the ego vehicle’s position was predicted from RL model outputs to determine optimal driving regions and form a driving corridor. The proposed algorithm’s effectiveness was validated through long-duration cyclic tests in dynamic and uncertain scenarios and comparative analysis of typical cases. Experimental results demonstrated that, compared with existing reachable set algorithms, the proposed method achieved better overall performance in enhancing driving efficiency and ensuring safety, comfort, and real-time responsiveness.

Key words: intelligent vehicle trajectory planning reachable set reinforcement learning driving corridor

收稿日期: 2024-08-27 出版日期: 2025-08-25

CLC:

U 467

基金资助: 国家自然科学基金资助项目（52272426）.

通讯作者: 裴晓飞 E-mail: gaohongwei@faw.com.cn;peixiaofei7@163.com

作者简介: 高洪伟（1982—），男，高级工程师，博士，从事智能汽车技术研究. orcid.org/0009-0007-7326-7143. E-mail：gaohongwei@faw.com.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	高洪伟
	尚秉旭
	张鑫康
	王洪峰
	何维
	裴晓飞

引用本文:

高洪伟,尚秉旭,张鑫康,王洪峰,何维,裴晓飞. 基于可达集和强化学习的智能汽车决策规划[J]. 浙江大学学报(工学版), 2025, 59(9): 1996-2004.

Hongwei GAO,Bingxu SHANG,Xinkang ZHANG,Hongfeng WANG,Wei HE,Xiaofei PEI. Decision-making and planning of intelligent vehicle based on reachable set and reinforcement learning. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1996-2004.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.09.023 或 https://www.zjujournals.com/eng/CN/Y2025/V59/I9/1996

图 1 强化学习决策引导下的可达集规划架构

表 1 横向需搜索的谓词

图 2 横向语义分割结果示意图

图 3 最终分割区域示意图

图 4 驾驶走廊生成流程

图 5 换道通道示意图

图 6 动态不确定测试场景

表 2 测试场景随机变量分布参数

表 3 强化学习模型的主要超参数

图 7 DDQN模型平均奖励训练结果

图 8 DDQN模型平均速度训练结果

表 4 动态不确定场景统计对比结果

图 9 典型场景示意图

图 10 典型场景下的轨迹和速度结果

图 11 典型场景下自车与前向车距结果

图 12 典型场景下第21 s时的驾驶走廊

1	朱冰, 贾士政, 赵健, 等自动驾驶车辆决策与规划研究综述[J]. 中国公路学报, 2024, 37 (1): 215- 240 ZHU Bing, JIA Shizheng, ZHAO Jian, et al Review of research on decision-making and planning for automated vehicles[J]. China Journal of Highway and Transport, 2024, 37 (1): 215- 240
2	NÉMETH B, GÁSPÁR P Hierarchical motion control strategies for handling interactions of automated vehicles[J]. Control Engineering Practice, 2023, 136: 105523 doi: 10.1016/j.conengprac.2023.105523
3	XIONG L, ZHANG Y, LIU Y, et al Integrated decision making and planning based on feasible region construction for autonomous vehicles considering prediction uncertainty[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8 (11): 4515- 4523 doi: 10.1109/TIV.2023.3299845
4	XIN L, KONG Y, LI S E, et al Enable faster and smoother spatio-temporal trajectory planning for autonomous vehicles in constrained dynamic environment[J]. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, 2021, 235 (4): 1101- 1112 doi: 10.1177/0954407020906627
5	MARTINEZ ROCAMORA B, PEREIRA G A S Parallel sensor-space lattice planner for real-time obstacle avoidance[J]. Sensors, 2022, 22 (13): 4770 doi: 10.3390/s22134770
6	MANZINGER S, PEK C, ALTHOFF M Using reachable sets for trajectory planning of automated vehicles[J]. IEEE Transactions on Intelligent Vehicles, 2021, 6 (2): 232- 248 doi: 10.1109/TIV.2020.3017342
7	HANG P, LV C, HUANG C, et al An integrated framework of decision making and motion planning for autonomous vehicles considering social behaviors[J]. IEEE Transactions on Vehicular Technology, 2020, 69 (12): 14458- 14469 doi: 10.1109/TVT.2020.3040398
8	ZHANG X, YANG B, PEI X, et al Trajectory planning based on spatio-temporal reachable set considering dynamic probabilistic risk[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106291 doi: 10.1016/j.engappai.2023.106291
9	SÖNTGES S, ALTHOFF M Computing the drivable area of autonomous road vehicles in dynamic road scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19 (6): 1855- 1866 doi: 10.1109/TITS.2017.2742141
10	MASCETTA T, LIU E I, ALTHOFF M. Rule-compliant multi-agent driving corridor generation using reachable sets and combinatorial negotiations [C]// Proceedings of the IEEE Intelligent Vehicles Symposium. Jeju Island: IEEE, 2024: 1417–1423.
11	LERCHER F, ALTHOFF M. Specification-compliant reachability analysis for autonomous vehicles using on-the-fly model checking [C]// Proceedings of the IEEE Intelligent Vehicles Symposium. Jeju Island: IEEE, 2024: 1484–1491.
12	ZHU Z, ZHAO H A survey of deep RL and IL for autonomous driving policy learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (9): 14043- 14065 doi: 10.1109/TITS.2021.3134702
13	DUAN J, EBEN LI S, GUAN Y, et al Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data[J]. IET Intelligent Transport Systems, 2020, 14 (5): 297- 305 doi: 10.1049/iet-its.2019.0317
14	TRAUTH R, HOBMEIER A, BETZ J. A reinforcement learning-boosted motion planning framework: comprehensive generalization performance in autonomous driving [EB/OL]. (2024-02-02)[2025-06-16]. https://arxiv.org/abs/2402.01465v1.
15	YU J, ARAB A, YI J, et al Hierarchical framework integrating rapidly-exploring random tree with deep reinforcement learning for autonomous vehicle[J]. Applied Intelligence, 2023, 53 (13): 16473- 16486 doi: 10.1007/s10489-022-04358-7
16	JAFARI R, ASHARI A E, HUBER M. CHAMP: integrated logic with reinforcement learning for hybrid decision making for autonomous vehicle planning [C]// Proceedings of the American Control Conference. San Diego: IEEE, 2023: 3310–3315.
17	CHEN D, JIANG L, WANG Y, et al. Autonomous driving using safe reinforcement learning by incorporating a regret-based human lane-changing decision model [C]// Proceedings of the American Control Conference. Denver: IEEE, 2020: 4355–4361.
18	ZHOU H, PEI X, LIU Y, et al. Trajectory planning for autonomous vehicles at urban intersections based on reachable sets [C]// IEEE Intelligent Vehicle Symposium. Cluj Napoca: IEEE, 2025: 1101–1107.
19	李国法, 陈耀昱, 吕辰, 等智能汽车决策中的驾驶行为语义解析关键技术[J]. 汽车安全与节能学报, 2019, 10 (4): 391- 412 LI Guofa, CHEN Yaoyu, LV Chen, et al Key techniques of semantic analysis of driving behavior in decision making of autonomous vehicles[J]. Journal of Automotive Safety and Energy, 2019, 10 (4): 391- 412 doi: 10.3969/j.issn.1674-8484.2019.04.001
20	QIAN L, XU X, ZENG Y, et al Synchronous maneuver searching and trajectory planning for autonomous vehicles in dynamic traffic environments[J]. IEEE Intelligent Transportation Systems Magazine, 2022, 14 (1): 57- 73 doi: 10.1109/MITS.2019.2953551
21	TREIBER M, HENNECKE A, HELBING D Congested traffic states in empirical observations and microscopic simulations[J]. Physical Review E, Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 2000, 62 (2A): 1805- 1824
22	周兴珍, 裴晓飞, 张鑫康基于可达集优化的智能汽车轨迹规划研究[J]. 武汉理工大学学报, 2022, 44 (6): 39- 48 ZHOU Xingzhen, PEI Xiaofei, ZHANG Xinkang Trajectory planning of intelligent vehicle based on reachable set and optimization[J]. Journal of Wuhan University of Technology, 2022, 44 (6): 39- 48 doi: 10.3963/j.issn.1671-4431.2022.06.007

[1]	柳佳乐,薛雅丽,崔闪,洪君. 动态窗口法引导的TD3无地图导航算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1671-1679.
[2]	郝琨,孟璇,赵晓芳,李志圣. 融合自适应势场法和深度强化学习的三维水下AUV路径规划方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1451-1461.
[3]	赵威,张万枝,侯加林,侯瑞,李玉华,赵乐俊,程进. 基于改进深度强化学习算法的农业机器人路径规划[J]. 浙江大学学报(工学版), 2025, 59(7): 1492-1503.
[4]	李颂元,朱祥维,李玺. 基座模型技术背景下的具身智能体综述[J]. 浙江大学学报(工学版), 2025, 59(2): 213-226.
[5]	张名芳,马健,赵娜乐,王力,刘颖. 无信号交叉口处基于深度强化学习的智能网联车辆运动规划[J]. 浙江大学学报(工学版), 2024, 58(9): 1923-1934.
[6]	叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.
[7]	王卓,李永强,冯宇,冯远静. 两方零和马尔科夫博弈策略梯度算法及收敛性分析[J]. 浙江大学学报(工学版), 2024, 58(3): 480-491.
[8]	王义娜,曹晨,杨佳琪,俞彦军,傅国强,王硕玉. 考虑个体习惯的轮椅机器人人机共享避障方法[J]. 浙江大学学报(工学版), 2024, 58(11): 2299-2308.
[9]	薛雅丽,叶金泽,李寒雁. 基于改进强化学习的多智能体追逃对抗[J]. 浙江大学学报(工学版), 2023, 57(8): 1479-1486.
[10]	仲重亮,刘云峰,朱伟东,朱赴东. 面向口腔种植的机器人多姿态轨迹平滑规划[J]. 浙江大学学报(工学版), 2023, 57(5): 1030-1037.
[11]	谭伟,刘景升,祖晖,全洪乾. 参数不确定和扰动下智能汽车路径跟踪控制[J]. 浙江大学学报(工学版), 2023, 57(4): 702-711.
[12]	徐少铭,李钰,袁晴龙. 基于强化学习和3σ准则的组合剪枝方法[J]. 浙江大学学报(工学版), 2023, 57(3): 486-494.
[13]	张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[14]	郭万金,赵伍端,利乾辉,赵立军,曹雏清. 基于集成概率模型的变阻抗机器人打磨力控制[J]. 浙江大学学报(工学版), 2023, 57(12): 2356-2366.
[15]	姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.

Viewed

Full text

Abstract

Cited

Shared

Discussed