Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (9): 1923-1934    DOI: 10.3785/j.issn.1008-973X.2024.09.017
    
Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning
Mingfang ZHANG1(),Jian MA1,Nale ZHAO2,Li WANG1,Ying LIU1
1. Beijing Key Laboratory of Urban Road Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China
2. Key Laboratory of Road Safety Technology of Transport Industry, Research Institute of Highway, Ministry of Transport, Beijing 100088, China
Download: HTML     PDF(2586KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A vehicle motion planning algorithm based on deep reinforcement learning was proposed to satisfy the efficiency and comfort requirements of intelligent connected vehicles at unsignalized intersections. Temporal convolutional network (TCN) and Transformer algorithms were combined to construct the intention prediction model for surrounding vehicles. The multi-layer convolution and self-attention mechanisms were used to improve the capability of capturing vehicle motion feature. The twin delayed deep deterministic policy gradient (TD3) reinforcement learning algorithm was employed to build the vehicle motion planning model. Taking the driving intention of surrounding vehicle, driving style, interaction risk, and the comfort of ego vehicle into consideration comprehensively, the state space and reward functions were designed to enhance understanding the dynamic environment. Delaying the policy updates and smoothing the target policies were conducted to improve the stability of the proposed algorithm, and the desired acceleration was output in real-time. Experimental results demonstrated that the proposed motion planning algorithm can perceive the real-time potential interaction risk based on the driving intention of surrounding vehicles. The generated motion planning strategy met the requirements of the efficiency, safety and comfort. It showed excellent adaptability to different styles of surrounding vehicles and dense interaction scenarios, and the success rates exceeded 92.1% in various scenarios.



Key wordsintelligent connected vehicle      deep reinforcement learning      unsignalized intersection      intention prediction      motion planning     
Received: 29 July 2023      Published: 30 August 2024
CLC:  V 467.1  
Fund:  国家重点研发计划资助项目(2022YFB4300400);北京市教育委员会科学研究计划资助项目(KM202210009013);中乌合作专项资助项目(106051360024XN017-02).
Cite this article:

Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU. Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1923-1934.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.09.017     OR     https://www.zjujournals.com/eng/Y2024/V58/I9/1923


无信号交叉口处基于深度强化学习的智能网联车辆运动规划

为了兼顾无信号交叉口处智能网联车辆通行效率和舒适性要求,提出基于深度强化学习的车辆运动规划算法. 结合时间卷积网络(TCN)和Transformer算法构建周围车辆行驶意图预测模型,通过多层卷积和自注意力机制提高车辆运动特征捕捉能力;利用双延迟深度确定性策略梯度 (TD3)强化学习算法构建车辆运动规划模型,综合考虑周围车辆行驶意图、驾驶风格、交互风险以及自车舒适性等因素设计状态空间和奖励函数以增强对动态环境的理解;通过延迟策略更新和平滑目标策略提高算法稳定性,实时输出期望加速度. 实验结果表明,所提运动规划算法能够根据周围车辆的行驶意图实时感知潜在的交互风险,生成的运动规划策略满足通行效率、安全性和舒适性要求,且对不同风格的周围车辆和密集交互场景均有良好的适应能力,不同场景下成功率均高于92.1%.


关键词: 智能网联汽车,  深度强化学习,  无信号交叉口,  意图预测,  运动规划 
Fig.1 Network architecture of TCN-Transformer
Fig.2 Structure of TD3 algorithm
Fig.3 Diagram of unsignalized intersection
算法实际意图 预测意图${p_{\mathrm{v}}}$/%FPS${P_{\mathrm{m}}}$
直行右转左转
CNN-Transformer直行149202193.1333354371
右转6134283.8
左转5613785.6
LSTM直行153181495.630088003
右转5141288.1
左转2114490.0
TCN-Transformer直行1575498.1250454211
右转3154096.3
左转0415697.5
Tab.1 Comparison of intent prediction results from different algorithms
Fig.4 Intent prediction results of surrounding vehicles
Fig.5 Training scenario of vehicle motion planning algorithms
参数名称数值
折扣因子0.99
Actor网络学习率0.00002
Critic网络学习率0.0001
学习率衰减间隔2×104
批次大小64
经验池大小3×105
初始探索概率0.5
最小探索概率0.05
探索概率衰减步数2×104
策略频率2
Tab.2 Hyperparameter settings for TD3 model
Fig.6 Training results of vehicle motion planning algorithms
Fig.7 Curves of vehicle motion parameters
Fig.8 Comparison of results of quantitative analysis of four algorithms
Fig.9 Schematic diagram of intersection scenarios with various traffic flow directions and densities
算法场景tp/sPs/%
IDM116.267.4
225.859.4
TD3110.578.2
216.465.8
TCN-Transformer-DDPG111.889.4
215.680.2
LSTM-TD319.190.2
212.684.8
TCN-Transformer-TD318.394.2
211.492.1
Tab.3 Success rates and average passage time of different algorithms in various scenarios
[1]   WANG C, XIE Y, HUANG H, et al A review of surrogate safety measures and their applications in connected and automated vehicles safety modeling[J]. Accident Analysis and Prevention, 2021, 157: 106157
doi: 10.1016/j.aap.2021.106157
[2]   钱立军, 陈晨, 陈健, 等 基于Q学习模型的无信号交叉口离散车队控制[J]. 汽车工程, 2022, 44 (9): 1350- 1358
QIAN Lijun, CHEN Chen, CHEN Jian, et al Discrete platoon control at an unsignalized intersection based on Q-learning model[J]. Automotive Engineering, 2022, 44 (9): 1350- 1358
[3]   孙启鹏, 武智刚, 曹宁博, 等 基于风险预测的自动驾驶车辆行为决策模型[J]. 浙江大学学报: 工学版, 2022, 56 (9): 1761- 1771
SUN Qipeng, WU Zhigang, CAO Ningbo, et al Decision-making model of autonomous vehicle behavior based on risk prediction[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (9): 1761- 1771
[4]   KESTING A, TREIBER M, HELBING D Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity[J]. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2010, 368 (1928): 4585- 4605
doi: 10.1098/rsta.2010.0084
[5]   陈秀锋, 高艳艳, 石英杰, 等 基于最优速度的弯道跟驰模型及其稳定性分析[J]. 重庆交通大学学报: 自然科学版, 2020, 39 (1): 126- 130
CHEN Xiufeng, GAO Yanyan, SHI Yingjie, et al Curve car following model based on optimal velocity and its stability analysis[J]. Journal of Chongqing Jiaotong University: Natural Science, 2020, 39 (1): 126- 130
[6]   孙辉辉, 胡春鹤, 张军国 移动机器人运动规划中的深度强化学习方法[J]. 控制与决策, 2021, 36 (6): 1281- 1292
SUN Huihui, HU Chunhe, ZHANG Junguo Deep reinforcement learning for motion planning of mobile robots[J]. Control and Decision, 2021, 36 (6): 1281- 1292
[7]   YANG Z, ZHANG Y, YU J, et al. End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions [C]// IEEE International Conference on Pattern Recognition . Beijing: IEEE, 2018: 2289−2294.
[8]   THU N T H, HAN D S. An end-to-end motion planner using sensor fusion for autonomous driving [C]// IEEE International Conference on Artificial Intelligence in Information and Communication . Bali: IEEE, 2023: 678−683.
[9]   ISELE D, RAHIMI R, COSGUN A, et al. Navigating occluded intersections with autonomous vehicles using deep reinforcement learning [C]// IEEE International Conference on Robotics and Automation . Brisbane: IEEE, 2018: 2034−2039.
[10]   GUNARATHNA U, KARUNASEKERA S, BOROVICA-GAJIC R, et al. Real-time intelligent autonomous intersection management using reinforcement learning [C]// IEEE Intelligent Vehicles Symposium . Aachen: IEEE, 2022: 135−144.
[11]   KIRAN B R, SOBH I, TALPAERT V, et al Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23 (6): 4909- 4926
[12]   KAMRAN D, LOPEZ C F, LAUER M, et al. Risk-aware high-level decisions for automated driving at occluded intersections with reinforcement learning [C]// IEEE Intelligent Vehicles Symposium . Las Vegas: IEEE, 2020: 1205−1212.
[13]   CHEN L, HU X, TANG B, et al Conditional DQN-based motion planning with fuzzy logic for autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 23 (4): 2966- 2977
[14]   高振海, 闫相同, 高菲, 等 仿驾驶员DDPG汽车纵向自动驾驶决策方法[J]. 汽车工程, 2021, 43 (12): 1737- 1744
GAO Zhenhai, YAN Xiangtong, GAO Fei, et al A driver-like decision-making method for longitudinal autonomous driving based on DDPG[J]. Automotive Engineering, 2021, 43 (12): 1737- 1744
[15]   邓小豪, 侯进, 谭光鸿, 等 基于强化学习的多目标车辆跟随决策算法[J]. 控制与决策, 2021, 36 (10): 2497- 2503
DENG Xiaohao, HOU Jin, TAN Guanghong, et al Multi-objective vehicle following decision algorithm based on reinforcement learning[J]. Control and Decision, 2021, 36 (10): 2497- 2503
[16]   LI G, LI S, LI S, et al Continuous decision-making for autonomous driving at intersections using deep deterministic policy gradient[J]. IET Intelligent Transport Systems, 2022, 16 (12): 1669- 1681
doi: 10.1049/itr2.12107
[17]   FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods [C]// International Conference on Machine Learning . Stockholm: PMLR, 2018: 1587−1596.
[18]   裴晓飞, 莫烁杰, 陈祯福, 等 基于 TD3 算法的人机混驾交通环境自动驾驶汽车换道研究[J]. 中国公路学报, 2021, 34 (11): 246- 254
PEI Xiaofei, MO Shuojie, CHEN Zhenfu, et al Lane changing of autonomous vehicle based on TD3 algorithm in human-machine hybrid driving environment[J]. China Journal of Highway and Transport, 2021, 34 (11): 246- 254
doi: 10.3969/j.issn.1001-7372.2021.11.020
[19]   吴翊恺, 胡启洲, 吴啸宇 车联网背景下的机动车辆轨迹预测模型[J]. 东南大学学报: 自然科学版, 2022, 52 (6): 1199- 1208
WU Yikai, HU Qizhou, WU Xiaoyu Vehicle trajectory prediction model in the context of internet of vehicles[J]. Journal of Southeast University: Natural Science Edition, 2022, 52 (6): 1199- 1208
[20]   AZADANI M N, BOUKERCHE A. Toward driver intention prediction for intelligent vehicles: a deep learning approach [C]// IEEE International Conference on Local Computer Networks . Edmonton: IEEE, 2021: 233−240.
[21]   王建强, 吴剑, 李洋 基于人-车-路协同的行车风险场概念, 原理及建模[J]. 中国公路学报, 2016, 29 (1): 105- 114
WANG Jianqiang, WU Jian, LI Yang Concept, principle and modeling of driving risk field based on driver-vehicle-road interaction[J]. China Journal of Highway and Transport, 2016, 29 (1): 105- 114
doi: 10.3969/j.issn.1001-7372.2016.01.014
[22]   高振海, 闫相同, 高菲 基于逆向强化学习的纵向自动驾驶决策方法[J]. 汽车工程, 2022, 44 (7): 969- 975
GAO Zhenhai, YAN Xiangtong, GAO Fei Reinforcement learning a decision-making method for longitudinal autonomous driving based on inverse[J]. Automotive Engineering, 2022, 44 (7): 969- 975
[23]   刘启冉, 连静, 陈实, 等 考虑交互轨迹预测的自动驾驶运动规划算法[J]. 东北大学学报: 自然科学版, 2022, 43 (7): 930- 936
LIU Qiran, LIAN Jing, CHEN Shi, et al Motion planning algorithm of autonomous driving considering interactive trajectory prediction[J]. Journal of Northeastern University: Natural Science, 2022, 43 (7): 930- 936
[1] Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO. Traffic signal control method based on asynchronous advantage actor-critic[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1671-1680.
[2] Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.
[3] Yu-feng JIANG,Dong-sheng CHEN. Assembly strategy for large-diameter peg-in-hole based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2210-2216.
[4] Qi-peng SUN,Zhi-gang WU,Ning-bo CAO,Fei MA,Ting-zhu DU. Decision-making model of autonomous vehicle behavior based on risk prediction[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1761-1771.
[5] Ru-qi DING,Wang-du LI,Gang LI,Guo-liang HU. Redundancy resolution of hydraulic manipulators based on minimum-flow[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1882-1890.
[6] Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.
[7] Zhi-min LIU,Bao-Lin YE,Yao-dong ZHU,Qing YAO,Wei-min WU. Traffic signal control method based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1249-1256.
[8] Qi-lin DENG,Juan LU,Yong-hui CHEN,Jian FENG,Xiao-ping LIAO,Jun-yan MA. Optimization method of CNC milling parameters based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2145-2155.
[9] Yan-lin WANG,Ke-yi WANG,Kui-cheng WANG,Zong-jun MO,Lu-ying WANG. Safety evaluation of bionic-muscle cable-driven lower limb rehabilitation robot system[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 168-177.
[10] Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 395-401.