Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (5): 987-994, 1005    DOI: 10.3785/j.issn.1008-973X.2022.05.016
    
Cooperative control algorithm of multi-intersection variable-direction lanes based on reinforcement learning
Xiao-gao XU1(),Ying-jie XIA1,*(),Si-yu ZHU1,Li KUANG2
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
2. School of Computer Science and Engineering, Central South University, Changsha 410012, China
Download: HTML     PDF(2154KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A cooperative control algorithm of multi-intersection variable-direction lanes based on multi-agent reinforcement learning was proposed to alleviate the congestion of multi-intersection, in order to solve the problem that traditional variable-direction lane control method can't adapt to the complex traffic flow problem under multiple intersections scenarios. In this method, the deep multi-agent reinforcement learning (QMIX ) algorithm was improved. The global reward under variable-direction lane scenarios was composed of basic reward and performance reward, which improved the decision-making accuracy of lane turn control in congestion scenarios. The priority experience playback algorithm was introduced to improve the utilization efficiency of the transfer sequence in the experience playback pool and accelerate the algorithm convergence. Experimental results show that the algorithm has better performance than other control methods in case of queue length, delay times and waiting times, which can effectively coordinate the policy switch of the variable-direction lanes and improve the road network capacity in the multi-intersection scenarios.



Key wordsvariable-direction lane      reinforcement learning      multi-agent      adaptive control      intelligent transportation     
Received: 23 May 2021      Published: 31 May 2022
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(61873232)
Corresponding Authors: Ying-jie XIA     E-mail: 21821323@zju.edu.cn;xiayingjie@zju.edu.cn
Cite this article:

Xiao-gao XU,Ying-jie XIA,Si-yu ZHU,Li KUANG. Cooperative control algorithm of multi-intersection variable-direction lanes based on reinforcement learning. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 987-994, 1005.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.05.016     OR     https://www.zjujournals.com/eng/Y2022/V56/I5/987


基于强化学习的多路口可变车道协同控制方法

为了解决传统的可变导向车道控制方法无法适应多路口场景下的复杂交通流的问题,提出基于多智能体强化学习的多路口可变导向车道协同控制方法来缓解多路口的交通拥堵状况. 该方法对多智能体强化学习 (QMIX)算法进行改进,针对可变导向车道场景下的全局奖励分配问题,将全局奖励分解为基本奖励与绩效奖励,提高了拥堵场景下对车道转向变化的决策准确性. 引入优先级经验回放算法,以提升经验回放池中转移序列的利用效率,加速算法收敛. 实验结果表明,本研究所提出的多路口可变导向车道协同控制方法在排队长度、延误时间和等待时间等指标上的表现优于其他控制方法,能够有效协调可变导向车道的策略切换,提高多路口下路网的通行能力.


关键词: 可变导向车道,  强化学习,  多智能体,  自适应控制,  智能交通 
Fig.1 Architecture of cooperative control method of multi-intersection variable-direction lanes
Fig.2 Multi-agent reinforcement learning model
Fig.3 Schematic diagram of vehicle location matrix
Fig.4 Global reward decomposition algorithm
Fig.5 Structure diagram of experimental road network area
Fig.6 Comparison results of algorithm reward indicators in test set
Fig.7 Comparison results of each traffic index in test set
Fig.8 Comparison of reward indicators in training process of multi-agent reinforcement learning algorithm
Fig.9 Comparison of traffic indicators during training of multi-agent reinforcement learning algorithm
[1]   WONG C K, WONG S C Lane-based optimization of signal timings for isolated junctions[J]. Transportation Research Part B: Methodological, 2003, 37 (1): 63- 84
doi: 10.1016/S0191-2615(01)00045-5
[2]   GOLUB A Perceived costs and benefits of reversible lanes in phoenix, Arizona[J]. ITE Journal: Institute of Transportation Engineers, 2012, 82 (2): 38
[3]   周立平, 董红利 信号交叉口转向可变车道长度研究[J]. 交通信息与安全, 2009, 27 (2): 58- 56
ZHOU Li-ping, DONG Hong-li Length of signal intersection turn variable lane[J]. Journal of Transport Information and Safety, 2009, 27 (2): 58- 56
[4]   赵靖, 周溪召 交叉口可变车道最佳车道功能及信号转变方法[J]. 上海理工大学学报, 2016, 38 (4): 380- 386
ZHAO Jing, ZHOU Xi-zhao Optimal switching method for lane assignment and signal control for variable lanes at intersections[J]. Journal of University of Shanghai for Science and Technology, 2016, 38 (4): 380- 386
[5]   聂磊, 马万经 基于车道等饱和度的交叉口车道功能优化模型[J]. 同济大学学报:自然科学版, 2020, 48 (1): 42- 50
NIE Lei, MA Wan-jing A novel model for optimization of lane allocation at isolated intersection[J]. Journal of Tongji University: Natural Science, 2020, 48 (1): 42- 50
[6]   聂磊, 马万经 基于车道的交叉口车道功能和信号相位优化模型[J]. 同济大学学报:自然科学版, 2020, 48 (5): 683- 693
NIE Lei, MA Wan-jing A lane-based optimization model for lane function and signal phase at intersection[J]. Journal of Tongji University: Natural Science, 2020, 48 (5): 683- 693
[7]   常玉林, 赵超, 张鹏, 等 拥堵条件下考虑相邻路口的可变导向车道自适应控制[J]. 重庆理工大学学报:自然科学, 2020, 34 (5): 17- 24
CHANG Yu-lin, ZHAO Chao, ZHANG Peng, et al An adaptive control of variable lane considering adjacent intersections under congested condition[J]. Journal of Chongqing University of Technology: Natural Science, 2020, 34 (5): 17- 24
[8]   赵超. 基于可变导向车道的多路口信号自适应控制方法[D]. 镇江: 江苏大学, 2019.
ZHAO Chao. Multi-intersection signal adaptive control based on variable approach lane[D]. Zhenjiang: Jiangsu University, 2019.
[9]   YAO R, ZHANG X, WU N, et al Modeling and control of variable approach lanes on an arterial road: a case study of Dalian[J]. Canadian Journal of Civil Engineering, 2018, 45 (11): 986- 1003
doi: 10.1139/cjce-2017-0432
[10]   LI L, QU Z, SONG X, et al. Research on variable lane signalized control method [C]// 2009 International Conference on Measuring Technology and Mechatronics Automation. Zhangjiajie: IEEE, 2009, 3: 575-578.
[11]   QING M, MIN W. A new control strategy of variable lane based on video detection [C]// 2014 5th International Conference on Intelligent Systems Design and Engineering Applications. Hunan: IEEE, 2014: 40-43.
[12]   HE J, ZHU Y, ZHANG J, et al. Reversible lane control system with low emission load based on VISSIM simulator [C]// 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering. Nanchang: IEEE, 2021: 911-914.
[13]   许佳佳, 许倩, 潘立琼 基于短时交通状态预测的交叉口导向车道智能转换系统[J]. 喀什大学学报, 2019, 40 (3): 39- 43
XU Jia-jia, XU Qian, PAN Li-qiong Intersection-oriented lane intelligent conversion system based on short term traffic state prediction[J]. Journal of Kashi University, 2019, 40 (3): 39- 43
[14]   蔡建荣, 黄汝晴, 黄中祥 考虑通行能力折减的可变车道优化[J]. 中南大学学报:自然科学版, 2018, 49 (7): 1838- 1844
CAI Jian-rong, HUANG Ru-qing, HUANG Zhong-xiang Optimization of variable lane considering reduction of capacity[J]. Journal of Central South University: Science and Technology, 2018, 49 (7): 1838- 1844
[15]   WEI H, ZHENG G, YAO H, et al. Intellilight: a reinforcement learning approach for intelligent traffic light control [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: [s. n.], 2018: 2496-2505.
[16]   CHU T, WANG J, CODECA L, et al Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095
[17]   WANG G, HU J, LI Z, et al Harmonious lane changing via deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (5): 4642- 4650
[18]   RASHID T, SAMVELYAN M, SCHROEDER C, et al. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning [C]// International Conference on Machine Learning. Stockholm: PMLR, 2018: 4295-4304.
[19]   SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward [C]// Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems. Richland: [s. n.], 2018: 2085-2087.
[1] Ke-wen ZHANG,Bai-song PAN. Control design of spacecraft autonomous rendezvous using nonlinear models with uncertainty[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 833-842.
[2] Guang-long LI,De-rong SHEN,Tie-zheng NIE,Yue KOU. Learning query optimization method based on multi model outside database[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 288-296.
[3] Pan ZHANG,Hua DING,Ying-er ZHANG,Bing-ning LI,Jiang-tao HUANG-FU,Zhong-he JIN. Multi-agent autonomous electronic jamming system based on information sharing[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 75-83.
[4] Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Agile imaging satellite task planning method for intensive observation[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1215-1224.
[5] Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 395-401.
[6] Jing LI,Chen WANG,Jia-xu ZHANG. Wheel slip tracking control of vehicle based on adaptive fast terminal sliding mode control method[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 169-176.
[7] Hang-lei SHAO,Dong-mei ZHANG. Synchronization of multi-agent systems based on static output feedback protocol[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1308-1315.
[8] Xu-dong GUAN,Jin ZHOU,Chao-wu JIN,Yuan-ping XU. Adaptive control performance of heavy load magnetic bearing and rotor[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(4): 662-670.
[9] Xia-sheng SHI,Rong-hao ZHENG,Zhi-yun LIN,Gang-feng YAN. Saddle dynamic based distributed algorithm for economic dispatch problem[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(4): 678-683.
[10] Tie ZHANG,Meng XIAO,Yan-biao ZOU,Jia-dong XIAO. Research on robot constant force control of surface tracking based on reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(10): 1865-1873.
[11] Yan ZHANG,Jian-zhou WANG,Wei LI,Jie WANG,Ling-ling CHEN,Peng YANG. Knee-joint exoskeleton control based on data-driven approach[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(10): 2024-2033.
[12] ZHAO Jie-mei, HU Zhong-hui. Path following control of AUV in horizontal plane based on dynamic feedback control[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(8): 1467-1473.
[13] TAO Guo-liang, ZUO He, LIU Hao. Structure design and motion control of parallel platform driven by pneumatic muscles and air cylinder[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(5): 821-828.
[14] ZHU Ya-guang, JIN Bo, LI Wei. Leg compliance control of hexapod robot based on adaptive-fuzzy control[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(8): 1419-1426.
[15] LUO Gao-sheng, GU Lin-yi, LI Lin. Robust adaptive control of elbow based on robust observer[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(5): 1-.