Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning

doi:10.3785/j.issn.1008-973X.2024.09.017

Journal of ZheJiang University (Engineering Science)

2024, Vol. 58

Issue (9): 1923-1934 DOI: 10.3785/j.issn.1008-973X.2024.09.017

Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning

Mingfang ZHANG1(

),Jian MA1,Nale ZHAO2,Li WANG1,Ying LIU1

1. Beijing Key Laboratory of Urban Road Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China
2. Key Laboratory of Road Safety Technology of Transport Industry, Research Institute of Highway, Ministry of Transport, Beijing 100088, China

Download:

HTML

PDF(2586KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A vehicle motion planning algorithm based on deep reinforcement learning was proposed to satisfy the efficiency and comfort requirements of intelligent connected vehicles at unsignalized intersections. Temporal convolutional network (TCN) and Transformer algorithms were combined to construct the intention prediction model for surrounding vehicles. The multi-layer convolution and self-attention mechanisms were used to improve the capability of capturing vehicle motion feature. The twin delayed deep deterministic policy gradient (TD3) reinforcement learning algorithm was employed to build the vehicle motion planning model. Taking the driving intention of surrounding vehicle, driving style, interaction risk, and the comfort of ego vehicle into consideration comprehensively, the state space and reward functions were designed to enhance understanding the dynamic environment. Delaying the policy updates and smoothing the target policies were conducted to improve the stability of the proposed algorithm, and the desired acceleration was output in real-time. Experimental results demonstrated that the proposed motion planning algorithm can perceive the real-time potential interaction risk based on the driving intention of surrounding vehicles. The generated motion planning strategy met the requirements of the efficiency, safety and comfort. It showed excellent adaptability to different styles of surrounding vehicles and dense interaction scenarios, and the success rates exceeded 92.1% in various scenarios.

Key words： intelligent connected vehicle deep reinforcement learning unsignalized intersection intention prediction motion planning

Received: 29 July 2023 Published: 30 August 2024

CLC:

V 467.1

Fund: 国家重点研发计划资助项目（2022YFB4300400）；北京市教育委员会科学研究计划资助项目（KM202210009013）；中乌合作专项资助项目（106051360024XN017-02）.

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Mingfang ZHANG
	Jian MA
	Nale ZHAO
	Li WANG
	Ying LIU

Cite this article:

Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU. Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1923-1934.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.09.017 OR https://www.zjujournals.com/eng/Y2024/V58/I9/1923

无信号交叉口处基于深度强化学习的智能网联车辆运动规划

为了兼顾无信号交叉口处智能网联车辆通行效率和舒适性要求，提出基于深度强化学习的车辆运动规划算法. 结合时间卷积网络(TCN)和Transformer算法构建周围车辆行驶意图预测模型，通过多层卷积和自注意力机制提高车辆运动特征捕捉能力；利用双延迟深度确定性策略梯度 (TD3)强化学习算法构建车辆运动规划模型，综合考虑周围车辆行驶意图、驾驶风格、交互风险以及自车舒适性等因素设计状态空间和奖励函数以增强对动态环境的理解；通过延迟策略更新和平滑目标策略提高算法稳定性，实时输出期望加速度. 实验结果表明，所提运动规划算法能够根据周围车辆的行驶意图实时感知潜在的交互风险，生成的运动规划策略满足通行效率、安全性和舒适性要求，且对不同风格的周围车辆和密集交互场景均有良好的适应能力，不同场景下成功率均高于92.1%.

关键词： 智能网联汽车, 深度强化学习, 无信号交叉口, 意图预测, 运动规划

Fig.1 Network architecture of TCN-Transformer

Fig.2 Structure of TD3 algorithm

Fig.3 Diagram of unsignalized intersection

Tab.1 Comparison of intent prediction results from different algorithms

Fig.4 Intent prediction results of surrounding vehicles

Fig.5 Training scenario of vehicle motion planning algorithms

Tab.2 Hyperparameter settings for TD3 model

Fig.6 Training results of vehicle motion planning algorithms

Fig.7 Curves of vehicle motion parameters

Fig.8 Comparison of results of quantitative analysis of four algorithms

Fig.9 Schematic diagram of intersection scenarios with various traffic flow directions and densities

Tab.3 Success rates and average passage time of different algorithms in various scenarios


[1]	WANG C, XIE Y, HUANG H, et al A review of surrogate safety measures and their applications in connected and automated vehicles safety modeling[J]. Accident Analysis and Prevention, 2021, 157: 106157 doi: 10.1016/j.aap.2021.106157

[2]	钱立军, 陈晨, 陈健, 等基于Q学习模型的无信号交叉口离散车队控制[J]. 汽车工程, 2022, 44 (9): 1350- 1358 QIAN Lijun, CHEN Chen, CHEN Jian, et al Discrete platoon control at an unsignalized intersection based on Q-learning model[J]. Automotive Engineering, 2022, 44 (9): 1350- 1358

[3]	孙启鹏, 武智刚, 曹宁博, 等基于风险预测的自动驾驶车辆行为决策模型[J]. 浙江大学学报: 工学版, 2022, 56 (9): 1761- 1771 SUN Qipeng, WU Zhigang, CAO Ningbo, et al Decision-making model of autonomous vehicle behavior based on risk prediction[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (9): 1761- 1771

[4]	KESTING A, TREIBER M, HELBING D Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity[J]. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2010, 368 (1928): 4585- 4605 doi: 10.1098/rsta.2010.0084

[5]	陈秀锋, 高艳艳, 石英杰, 等基于最优速度的弯道跟驰模型及其稳定性分析[J]. 重庆交通大学学报: 自然科学版, 2020, 39 (1): 126- 130 CHEN Xiufeng, GAO Yanyan, SHI Yingjie, et al Curve car following model based on optimal velocity and its stability analysis[J]. Journal of Chongqing Jiaotong University: Natural Science, 2020, 39 (1): 126- 130

[6]	孙辉辉, 胡春鹤, 张军国移动机器人运动规划中的深度强化学习方法[J]. 控制与决策, 2021, 36 (6): 1281- 1292 SUN Huihui, HU Chunhe, ZHANG Junguo Deep reinforcement learning for motion planning of mobile robots[J]. Control and Decision, 2021, 36 (6): 1281- 1292

[7]	YANG Z, ZHANG Y, YU J, et al. End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions [C]// IEEE International Conference on Pattern Recognition . Beijing: IEEE, 2018: 2289−2294.

[8]	THU N T H, HAN D S. An end-to-end motion planner using sensor fusion for autonomous driving [C]// IEEE International Conference on Artificial Intelligence in Information and Communication . Bali: IEEE, 2023: 678−683.

[9]	ISELE D, RAHIMI R, COSGUN A, et al. Navigating occluded intersections with autonomous vehicles using deep reinforcement learning [C]// IEEE International Conference on Robotics and Automation . Brisbane: IEEE, 2018: 2034−2039.

[10]	GUNARATHNA U, KARUNASEKERA S, BOROVICA-GAJIC R, et al. Real-time intelligent autonomous intersection management using reinforcement learning [C]// IEEE Intelligent Vehicles Symposium . Aachen: IEEE, 2022: 135−144.

[11]	KIRAN B R, SOBH I, TALPAERT V, et al Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23 (6): 4909- 4926

[12]	KAMRAN D, LOPEZ C F, LAUER M, et al. Risk-aware high-level decisions for automated driving at occluded intersections with reinforcement learning [C]// IEEE Intelligent Vehicles Symposium . Las Vegas: IEEE, 2020: 1205−1212.

[13]	CHEN L, HU X, TANG B, et al Conditional DQN-based motion planning with fuzzy logic for autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 23 (4): 2966- 2977

[14]	高振海, 闫相同, 高菲, 等仿驾驶员DDPG汽车纵向自动驾驶决策方法[J]. 汽车工程, 2021, 43 (12): 1737- 1744 GAO Zhenhai, YAN Xiangtong, GAO Fei, et al A driver-like decision-making method for longitudinal autonomous driving based on DDPG[J]. Automotive Engineering, 2021, 43 (12): 1737- 1744

[15]	邓小豪, 侯进, 谭光鸿, 等基于强化学习的多目标车辆跟随决策算法[J]. 控制与决策, 2021, 36 (10): 2497- 2503 DENG Xiaohao, HOU Jin, TAN Guanghong, et al Multi-objective vehicle following decision algorithm based on reinforcement learning[J]. Control and Decision, 2021, 36 (10): 2497- 2503

[16]	LI G, LI S, LI S, et al Continuous decision-making for autonomous driving at intersections using deep deterministic policy gradient[J]. IET Intelligent Transport Systems, 2022, 16 (12): 1669- 1681 doi: 10.1049/itr2.12107

[17]	FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods [C]// International Conference on Machine Learning . Stockholm: PMLR, 2018: 1587−1596.

[18]	裴晓飞, 莫烁杰, 陈祯福, 等基于 TD3 算法的人机混驾交通环境自动驾驶汽车换道研究[J]. 中国公路学报, 2021, 34 (11): 246- 254 PEI Xiaofei, MO Shuojie, CHEN Zhenfu, et al Lane changing of autonomous vehicle based on TD3 algorithm in human-machine hybrid driving environment[J]. China Journal of Highway and Transport, 2021, 34 (11): 246- 254 doi: 10.3969/j.issn.1001-7372.2021.11.020

[19]	吴翊恺, 胡启洲, 吴啸宇车联网背景下的机动车辆轨迹预测模型[J]. 东南大学学报: 自然科学版, 2022, 52 (6): 1199- 1208 WU Yikai, HU Qizhou, WU Xiaoyu Vehicle trajectory prediction model in the context of internet of vehicles[J]. Journal of Southeast University: Natural Science Edition, 2022, 52 (6): 1199- 1208

[20]	AZADANI M N, BOUKERCHE A. Toward driver intention prediction for intelligent vehicles: a deep learning approach [C]// IEEE International Conference on Local Computer Networks . Edmonton: IEEE, 2021: 233−240.

[21]	王建强, 吴剑, 李洋基于人-车-路协同的行车风险场概念, 原理及建模[J]. 中国公路学报, 2016, 29 (1): 105- 114 WANG Jianqiang, WU Jian, LI Yang Concept, principle and modeling of driving risk field based on driver-vehicle-road interaction[J]. China Journal of Highway and Transport, 2016, 29 (1): 105- 114 doi: 10.3969/j.issn.1001-7372.2016.01.014

[22]	高振海, 闫相同, 高菲基于逆向强化学习的纵向自动驾驶决策方法[J]. 汽车工程, 2022, 44 (7): 969- 975 GAO Zhenhai, YAN Xiangtong, GAO Fei Reinforcement learning a decision-making method for longitudinal autonomous driving based on inverse[J]. Automotive Engineering, 2022, 44 (7): 969- 975

[23]	刘启冉, 连静, 陈实, 等考虑交互轨迹预测的自动驾驶运动规划算法[J]. 东北大学学报: 自然科学版, 2022, 43 (7): 930- 936 LIU Qiran, LIAN Jing, CHEN Shi, et al Motion planning algorithm of autonomous driving considering interactive trajectory prediction[J]. Journal of Northeastern University: Natural Science, 2022, 43 (7): 930- 936

[1]	Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO. Traffic signal control method based on asynchronous advantage actor-critic[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1671-1680.

[2]	Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.

[3]	Yu-feng JIANG,Dong-sheng CHEN. Assembly strategy for large-diameter peg-in-hole based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2210-2216.

[4]	Qi-peng SUN,Zhi-gang WU,Ning-bo CAO,Fei MA,Ting-zhu DU. Decision-making model of autonomous vehicle behavior based on risk prediction[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1761-1771.

[5]	Ru-qi DING,Wang-du LI,Gang LI,Guo-liang HU. Redundancy resolution of hydraulic manipulators based on minimum-flow[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1882-1890.

[6]	Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.

[7]	Zhi-min LIU,Bao-Lin YE,Yao-dong ZHU,Qing YAO,Wei-min WU. Traffic signal control method based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1249-1256.

[8]	Qi-lin DENG,Juan LU,Yong-hui CHEN,Jian FENG,Xiao-ping LIAO,Jun-yan MA. Optimization method of CNC milling parameters based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2145-2155.

[9]	Yan-lin WANG,Ke-yi WANG,Kui-cheng WANG,Zong-jun MO,Lu-ying WANG. Safety evaluation of bionic-muscle cable-driven lower limb rehabilitation robot system[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 168-177.

[10]	Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 395-401.

Viewed

Full text

Abstract

Cited

Shared

Discussed