Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (9): 1923-1934    DOI: 10.3785/j.issn.1008-973X.2024.09.017
交通工程     
无信号交叉口处基于深度强化学习的智能网联车辆运动规划
张名芳1(),马健1,赵娜乐2,王力1,刘颖1
1. 北方工业大学 城市道路智能交通控制技术北京市重点实验室,北京 100144
2. 交通运输部公路科学研究院 公路交通安全技术交通运输行业重点实验室,北京 100088
Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning
Mingfang ZHANG1(),Jian MA1,Nale ZHAO2,Li WANG1,Ying LIU1
1. Beijing Key Laboratory of Urban Road Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China
2. Key Laboratory of Road Safety Technology of Transport Industry, Research Institute of Highway, Ministry of Transport, Beijing 100088, China
 全文: PDF(2586 KB)   HTML
摘要:

为了兼顾无信号交叉口处智能网联车辆通行效率和舒适性要求,提出基于深度强化学习的车辆运动规划算法. 结合时间卷积网络(TCN)和Transformer算法构建周围车辆行驶意图预测模型,通过多层卷积和自注意力机制提高车辆运动特征捕捉能力;利用双延迟深度确定性策略梯度 (TD3)强化学习算法构建车辆运动规划模型,综合考虑周围车辆行驶意图、驾驶风格、交互风险以及自车舒适性等因素设计状态空间和奖励函数以增强对动态环境的理解;通过延迟策略更新和平滑目标策略提高算法稳定性,实时输出期望加速度. 实验结果表明,所提运动规划算法能够根据周围车辆的行驶意图实时感知潜在的交互风险,生成的运动规划策略满足通行效率、安全性和舒适性要求,且对不同风格的周围车辆和密集交互场景均有良好的适应能力,不同场景下成功率均高于92.1%.

关键词: 智能网联汽车深度强化学习无信号交叉口意图预测运动规划    
Abstract:

A vehicle motion planning algorithm based on deep reinforcement learning was proposed to satisfy the efficiency and comfort requirements of intelligent connected vehicles at unsignalized intersections. Temporal convolutional network (TCN) and Transformer algorithms were combined to construct the intention prediction model for surrounding vehicles. The multi-layer convolution and self-attention mechanisms were used to improve the capability of capturing vehicle motion feature. The twin delayed deep deterministic policy gradient (TD3) reinforcement learning algorithm was employed to build the vehicle motion planning model. Taking the driving intention of surrounding vehicle, driving style, interaction risk, and the comfort of ego vehicle into consideration comprehensively, the state space and reward functions were designed to enhance understanding the dynamic environment. Delaying the policy updates and smoothing the target policies were conducted to improve the stability of the proposed algorithm, and the desired acceleration was output in real-time. Experimental results demonstrated that the proposed motion planning algorithm can perceive the real-time potential interaction risk based on the driving intention of surrounding vehicles. The generated motion planning strategy met the requirements of the efficiency, safety and comfort. It showed excellent adaptability to different styles of surrounding vehicles and dense interaction scenarios, and the success rates exceeded 92.1% in various scenarios.

Key words: intelligent connected vehicle    deep reinforcement learning    unsignalized intersection    intention prediction    motion planning
收稿日期: 2023-07-29 出版日期: 2024-08-30
CLC:  V 467.1  
基金资助: 国家重点研发计划资助项目(2022YFB4300400);北京市教育委员会科学研究计划资助项目(KM202210009013);中乌合作专项资助项目(106051360024XN017-02).
作者简介: 张名芳(1989—)女, 副教授, 博士, 从事智能车辆感知与决策研究. orcid.org/0000-0003-3727-3101. E-mail:mingfang@ncut.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
张名芳
马健
赵娜乐
王力
刘颖

引用本文:

张名芳,马健,赵娜乐,王力,刘颖. 无信号交叉口处基于深度强化学习的智能网联车辆运动规划[J]. 浙江大学学报(工学版), 2024, 58(9): 1923-1934.

Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU. Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1923-1934.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.09.017        https://www.zjujournals.com/eng/CN/Y2024/V58/I9/1923

图 1  TCN-Transformer网络结构
图 2  TD3算法结构
图 3  无信号交叉口示意图
算法实际意图 预测意图${p_{\mathrm{v}}}$/%FPS${P_{\mathrm{m}}}$
直行右转左转
CNN-Transformer直行149202193.1333354371
右转6134283.8
左转5613785.6
LSTM直行153181495.630088003
右转5141288.1
左转2114490.0
TCN-Transformer直行1575498.1250454211
右转3154096.3
左转0415697.5
表 1  不同算法意图预测结果对比
图 4  周围车辆意图预测结果
图 5  车辆运动规划算法训练场景
参数名称数值
折扣因子0.99
Actor网络学习率0.00002
Critic网络学习率0.0001
学习率衰减间隔2×104
批次大小64
经验池大小3×105
初始探索概率0.5
最小探索概率0.05
探索概率衰减步数2×104
策略频率2
表 2  TD3模型的超参数设置
图 6  车辆运动规划算法训练结果
图 7  车辆运动参数变化曲线
图 8  4种算法定量分析结果对比
图 9  不同车流方向和车流密度的交叉口场景示意图
算法场景tp/sPs/%
IDM116.267.4
225.859.4
TD3110.578.2
216.465.8
TCN-Transformer-DDPG111.889.4
215.680.2
LSTM-TD319.190.2
212.684.8
TCN-Transformer-TD318.394.2
211.492.1
表 3  不同场景下各算法运行成功率和平均通行时间
1 WANG C, XIE Y, HUANG H, et al A review of surrogate safety measures and their applications in connected and automated vehicles safety modeling[J]. Accident Analysis and Prevention, 2021, 157: 106157
doi: 10.1016/j.aap.2021.106157
2 钱立军, 陈晨, 陈健, 等 基于Q学习模型的无信号交叉口离散车队控制[J]. 汽车工程, 2022, 44 (9): 1350- 1358
QIAN Lijun, CHEN Chen, CHEN Jian, et al Discrete platoon control at an unsignalized intersection based on Q-learning model[J]. Automotive Engineering, 2022, 44 (9): 1350- 1358
3 孙启鹏, 武智刚, 曹宁博, 等 基于风险预测的自动驾驶车辆行为决策模型[J]. 浙江大学学报: 工学版, 2022, 56 (9): 1761- 1771
SUN Qipeng, WU Zhigang, CAO Ningbo, et al Decision-making model of autonomous vehicle behavior based on risk prediction[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (9): 1761- 1771
4 KESTING A, TREIBER M, HELBING D Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity[J]. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2010, 368 (1928): 4585- 4605
doi: 10.1098/rsta.2010.0084
5 陈秀锋, 高艳艳, 石英杰, 等 基于最优速度的弯道跟驰模型及其稳定性分析[J]. 重庆交通大学学报: 自然科学版, 2020, 39 (1): 126- 130
CHEN Xiufeng, GAO Yanyan, SHI Yingjie, et al Curve car following model based on optimal velocity and its stability analysis[J]. Journal of Chongqing Jiaotong University: Natural Science, 2020, 39 (1): 126- 130
6 孙辉辉, 胡春鹤, 张军国 移动机器人运动规划中的深度强化学习方法[J]. 控制与决策, 2021, 36 (6): 1281- 1292
SUN Huihui, HU Chunhe, ZHANG Junguo Deep reinforcement learning for motion planning of mobile robots[J]. Control and Decision, 2021, 36 (6): 1281- 1292
7 YANG Z, ZHANG Y, YU J, et al. End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions [C]// IEEE International Conference on Pattern Recognition . Beijing: IEEE, 2018: 2289−2294.
8 THU N T H, HAN D S. An end-to-end motion planner using sensor fusion for autonomous driving [C]// IEEE International Conference on Artificial Intelligence in Information and Communication . Bali: IEEE, 2023: 678−683.
9 ISELE D, RAHIMI R, COSGUN A, et al. Navigating occluded intersections with autonomous vehicles using deep reinforcement learning [C]// IEEE International Conference on Robotics and Automation . Brisbane: IEEE, 2018: 2034−2039.
10 GUNARATHNA U, KARUNASEKERA S, BOROVICA-GAJIC R, et al. Real-time intelligent autonomous intersection management using reinforcement learning [C]// IEEE Intelligent Vehicles Symposium . Aachen: IEEE, 2022: 135−144.
11 KIRAN B R, SOBH I, TALPAERT V, et al Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23 (6): 4909- 4926
12 KAMRAN D, LOPEZ C F, LAUER M, et al. Risk-aware high-level decisions for automated driving at occluded intersections with reinforcement learning [C]// IEEE Intelligent Vehicles Symposium . Las Vegas: IEEE, 2020: 1205−1212.
13 CHEN L, HU X, TANG B, et al Conditional DQN-based motion planning with fuzzy logic for autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 23 (4): 2966- 2977
14 高振海, 闫相同, 高菲, 等 仿驾驶员DDPG汽车纵向自动驾驶决策方法[J]. 汽车工程, 2021, 43 (12): 1737- 1744
GAO Zhenhai, YAN Xiangtong, GAO Fei, et al A driver-like decision-making method for longitudinal autonomous driving based on DDPG[J]. Automotive Engineering, 2021, 43 (12): 1737- 1744
15 邓小豪, 侯进, 谭光鸿, 等 基于强化学习的多目标车辆跟随决策算法[J]. 控制与决策, 2021, 36 (10): 2497- 2503
DENG Xiaohao, HOU Jin, TAN Guanghong, et al Multi-objective vehicle following decision algorithm based on reinforcement learning[J]. Control and Decision, 2021, 36 (10): 2497- 2503
16 LI G, LI S, LI S, et al Continuous decision-making for autonomous driving at intersections using deep deterministic policy gradient[J]. IET Intelligent Transport Systems, 2022, 16 (12): 1669- 1681
doi: 10.1049/itr2.12107
17 FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods [C]// International Conference on Machine Learning . Stockholm: PMLR, 2018: 1587−1596.
18 裴晓飞, 莫烁杰, 陈祯福, 等 基于 TD3 算法的人机混驾交通环境自动驾驶汽车换道研究[J]. 中国公路学报, 2021, 34 (11): 246- 254
PEI Xiaofei, MO Shuojie, CHEN Zhenfu, et al Lane changing of autonomous vehicle based on TD3 algorithm in human-machine hybrid driving environment[J]. China Journal of Highway and Transport, 2021, 34 (11): 246- 254
doi: 10.3969/j.issn.1001-7372.2021.11.020
19 吴翊恺, 胡启洲, 吴啸宇 车联网背景下的机动车辆轨迹预测模型[J]. 东南大学学报: 自然科学版, 2022, 52 (6): 1199- 1208
WU Yikai, HU Qizhou, WU Xiaoyu Vehicle trajectory prediction model in the context of internet of vehicles[J]. Journal of Southeast University: Natural Science Edition, 2022, 52 (6): 1199- 1208
20 AZADANI M N, BOUKERCHE A. Toward driver intention prediction for intelligent vehicles: a deep learning approach [C]// IEEE International Conference on Local Computer Networks . Edmonton: IEEE, 2021: 233−240.
21 王建强, 吴剑, 李洋 基于人-车-路协同的行车风险场概念, 原理及建模[J]. 中国公路学报, 2016, 29 (1): 105- 114
WANG Jianqiang, WU Jian, LI Yang Concept, principle and modeling of driving risk field based on driver-vehicle-road interaction[J]. China Journal of Highway and Transport, 2016, 29 (1): 105- 114
doi: 10.3969/j.issn.1001-7372.2016.01.014
22 高振海, 闫相同, 高菲 基于逆向强化学习的纵向自动驾驶决策方法[J]. 汽车工程, 2022, 44 (7): 969- 975
GAO Zhenhai, YAN Xiangtong, GAO Fei Reinforcement learning a decision-making method for longitudinal autonomous driving based on inverse[J]. Automotive Engineering, 2022, 44 (7): 969- 975
23 刘启冉, 连静, 陈实, 等 考虑交互轨迹预测的自动驾驶运动规划算法[J]. 东北大学学报: 自然科学版, 2022, 43 (7): 930- 936
LIU Qiran, LIAN Jing, CHEN Shi, et al Motion planning algorithm of autonomous driving considering interactive trajectory prediction[J]. Journal of Northeastern University: Natural Science, 2022, 43 (7): 930- 936
[1] 叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.
[2] 张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[3] 姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[4] 孙启鹏,武智刚,曹宁博,马飞,杜婷竺. 基于风险预测的自动驾驶车辆行为决策模型[J]. 浙江大学学报(工学版), 2022, 56(9): 1761-1771.
[5] 丁孺琦,李望笃,李刚,胡国良. 基于最小流量的液压机械臂冗余分解[J]. 浙江大学学报(工学版), 2022, 56(9): 1882-1890.
[6] 华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[7] 刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[8] 邓齐林,鲁娟,陈勇辉,冯健,廖小平,马俊燕. 基于深度强化学习的数控铣削加工参数优化方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2145-2155.
[9] 王砚麟,王克义,王奎成,莫宗骏,王璐莹. 仿肌肉绳索驱动下肢康复机器人系统使用安全性评价[J]. 浙江大学学报(工学版), 2022, 56(1): 168-177.
[10] 马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.