基于强化学习的多路口可变车道协同控制方法

doi:10.3785/j.issn.1008-973X.2022.05.016

浙江大学学报(工学版)

2022, Vol. 56

Issue (5): 987-994, 1005 DOI: 10.3785/j.issn.1008-973X.2022.05.016

计算机与控制工程

基于强化学习的多路口可变车道协同控制方法

徐小高1(

),夏莹杰1,*(

),朱思雨1,邝砾2

1. 浙江大学计算机科学与技术学院，浙江杭州 310027
2. 中南大学计算机学院，湖南长沙 410012

Cooperative control algorithm of multi-intersection variable-direction lanes based on reinforcement learning

Xiao-gao XU1(

),Ying-jie XIA1,*(

),Si-yu ZHU1,Li KUANG2

1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
2. School of Computer Science and Engineering, Central South University, Changsha 410012, China

全文: PDF(2154 KB) HTML

摘要：

为了解决传统的可变导向车道控制方法无法适应多路口场景下的复杂交通流的问题，提出基于多智能体强化学习的多路口可变导向车道协同控制方法来缓解多路口的交通拥堵状况. 该方法对多智能体强化学习 (QMIX)算法进行改进，针对可变导向车道场景下的全局奖励分配问题，将全局奖励分解为基本奖励与绩效奖励，提高了拥堵场景下对车道转向变化的决策准确性. 引入优先级经验回放算法，以提升经验回放池中转移序列的利用效率，加速算法收敛. 实验结果表明，本研究所提出的多路口可变导向车道协同控制方法在排队长度、延误时间和等待时间等指标上的表现优于其他控制方法，能够有效协调可变导向车道的策略切换，提高多路口下路网的通行能力.

关键词： 可变导向车道; 强化学习; 多智能体; 自适应控制; 智能交通

Abstract:

A cooperative control algorithm of multi-intersection variable-direction lanes based on multi-agent reinforcement learning was proposed to alleviate the congestion of multi-intersection, in order to solve the problem that traditional variable-direction lane control method can't adapt to the complex traffic flow problem under multiple intersections scenarios. In this method, the deep multi-agent reinforcement learning (QMIX ) algorithm was improved. The global reward under variable-direction lane scenarios was composed of basic reward and performance reward, which improved the decision-making accuracy of lane turn control in congestion scenarios. The priority experience playback algorithm was introduced to improve the utilization efficiency of the transfer sequence in the experience playback pool and accelerate the algorithm convergence. Experimental results show that the algorithm has better performance than other control methods in case of queue length, delay times and waiting times, which can effectively coordinate the policy switch of the variable-direction lanes and improve the road network capacity in the multi-intersection scenarios.

Key words: variable-direction lane reinforcement learning multi-agent adaptive control intelligent transportation

收稿日期: 2021-05-23 出版日期: 2022-05-31

CLC:

TP 391

基金资助: 国家自然科学基金资助项目（61873232）

通讯作者: 夏莹杰 E-mail: 21821323@zju.edu.cn;xiayingjie@zju.edu.cn

作者简介: 徐小高（1997—），男，硕士，从事智慧交通及强化学习研究. orcid.org/0000-0003-4698-9242. E-mail: 21821323@zju.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	徐小高
	夏莹杰
	朱思雨
	邝砾

引用本文:

徐小高,夏莹杰,朱思雨,邝砾. 基于强化学习的多路口可变车道协同控制方法[J]. 浙江大学学报(工学版), 2022, 56(5): 987-994, 1005.

Xiao-gao XU,Ying-jie XIA,Si-yu ZHU,Li KUANG. Cooperative control algorithm of multi-intersection variable-direction lanes based on reinforcement learning. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 987-994, 1005.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.05.016 或 https://www.zjujournals.com/eng/CN/Y2022/V56/I5/987

图 1 多路口可变导向车道协同控制方法整体架构

图 2 多智能体强化学习模型

图 3 车辆位置矩阵示意图

图 4 全局奖励分解算法

图 5 实验路网区域结构图

图 6 测试集中算法奖励指标对比结果

图 7 测试集中各交通指标对比结果

图 8 多智能体强化学习算法训练过程奖励指标对比

图 9 多智能体强化学习算法训练过程中交通指标对比

1	WONG C K, WONG S C Lane-based optimization of signal timings for isolated junctions[J]. Transportation Research Part B: Methodological, 2003, 37 (1): 63- 84 doi: 10.1016/S0191-2615(01)00045-5
2	GOLUB A Perceived costs and benefits of reversible lanes in phoenix, Arizona[J]. ITE Journal: Institute of Transportation Engineers, 2012, 82 (2): 38
3	周立平, 董红利信号交叉口转向可变车道长度研究[J]. 交通信息与安全, 2009, 27 (2): 58- 56 ZHOU Li-ping, DONG Hong-li Length of signal intersection turn variable lane[J]. Journal of Transport Information and Safety, 2009, 27 (2): 58- 56
4	赵靖, 周溪召交叉口可变车道最佳车道功能及信号转变方法[J]. 上海理工大学学报, 2016, 38 (4): 380- 386 ZHAO Jing, ZHOU Xi-zhao Optimal switching method for lane assignment and signal control for variable lanes at intersections[J]. Journal of University of Shanghai for Science and Technology, 2016, 38 (4): 380- 386
5	聂磊, 马万经基于车道等饱和度的交叉口车道功能优化模型[J]. 同济大学学报:自然科学版, 2020, 48 (1): 42- 50 NIE Lei, MA Wan-jing A novel model for optimization of lane allocation at isolated intersection[J]. Journal of Tongji University: Natural Science, 2020, 48 (1): 42- 50
6	聂磊, 马万经基于车道的交叉口车道功能和信号相位优化模型[J]. 同济大学学报:自然科学版, 2020, 48 (5): 683- 693 NIE Lei, MA Wan-jing A lane-based optimization model for lane function and signal phase at intersection[J]. Journal of Tongji University: Natural Science, 2020, 48 (5): 683- 693
7	常玉林, 赵超, 张鹏, 等拥堵条件下考虑相邻路口的可变导向车道自适应控制[J]. 重庆理工大学学报:自然科学, 2020, 34 (5): 17- 24 CHANG Yu-lin, ZHAO Chao, ZHANG Peng, et al An adaptive control of variable lane considering adjacent intersections under congested condition[J]. Journal of Chongqing University of Technology: Natural Science, 2020, 34 (5): 17- 24
8	赵超. 基于可变导向车道的多路口信号自适应控制方法[D]. 镇江: 江苏大学, 2019. ZHAO Chao. Multi-intersection signal adaptive control based on variable approach lane[D]. Zhenjiang: Jiangsu University, 2019.
9	YAO R, ZHANG X, WU N, et al Modeling and control of variable approach lanes on an arterial road: a case study of Dalian[J]. Canadian Journal of Civil Engineering, 2018, 45 (11): 986- 1003 doi: 10.1139/cjce-2017-0432
10	LI L, QU Z, SONG X, et al. Research on variable lane signalized control method [C]// 2009 International Conference on Measuring Technology and Mechatronics Automation. Zhangjiajie: IEEE, 2009, 3: 575-578.
11	QING M, MIN W. A new control strategy of variable lane based on video detection [C]// 2014 5th International Conference on Intelligent Systems Design and Engineering Applications. Hunan: IEEE, 2014: 40-43.
12	HE J, ZHU Y, ZHANG J, et al. Reversible lane control system with low emission load based on VISSIM simulator [C]// 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering. Nanchang: IEEE, 2021: 911-914.
13	许佳佳, 许倩, 潘立琼基于短时交通状态预测的交叉口导向车道智能转换系统[J]. 喀什大学学报, 2019, 40 (3): 39- 43 XU Jia-jia, XU Qian, PAN Li-qiong Intersection-oriented lane intelligent conversion system based on short term traffic state prediction[J]. Journal of Kashi University, 2019, 40 (3): 39- 43
14	蔡建荣, 黄汝晴, 黄中祥考虑通行能力折减的可变车道优化[J]. 中南大学学报:自然科学版, 2018, 49 (7): 1838- 1844 CAI Jian-rong, HUANG Ru-qing, HUANG Zhong-xiang Optimization of variable lane considering reduction of capacity[J]. Journal of Central South University: Science and Technology, 2018, 49 (7): 1838- 1844
15	WEI H, ZHENG G, YAO H, et al. Intellilight: a reinforcement learning approach for intelligent traffic light control [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: [s. n.], 2018: 2496-2505.
16	CHU T, WANG J, CODECA L, et al Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095
17	WANG G, HU J, LI Z, et al Harmonious lane changing via deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (5): 4642- 4650
18	RASHID T, SAMVELYAN M, SCHROEDER C, et al. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning [C]// International Conference on Machine Learning. Stockholm: PMLR, 2018: 4295-4304.
19	SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward [C]// Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems. Richland: [s. n.], 2018: 2085-2087.

[1]	张科文,潘柏松. 考虑非线性模型不确定性的航天器自主交会控制[J]. 浙江大学学报(工学版), 2022, 56(4): 833-842.
[2]	李广龙,申德荣,聂铁铮,寇月. 数据库外基于多模型的学习式查询优化方法[J]. 浙江大学学报(工学版), 2022, 56(2): 288-296.
[3]	张盼,丁华,张颖而,李冰凝,皇甫江涛,金仲和. 基于信息共享的多智能体自主电子干扰系统[J]. 浙江大学学报(工学版), 2022, 56(1): 75-83.
[4]	张楠,董红召,佘翊妮. 公交专用道条件下公交车辆轨迹的Seq2Seq预测[J]. 浙江大学学报(工学版), 2021, 55(8): 1482-1489.
[5]	马一凡,赵凡宇,王鑫,金仲和. 密集观测场景下的敏捷成像卫星任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(6): 1215-1224.
[6]	马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.
[7]	李静,王晨,张家旭. 基于自适应快速终端滑模的车轮滑移率跟踪控制[J]. 浙江大学学报(工学版), 2021, 55(1): 169-176.
[8]	邵杭蕾,张冬梅. 基于静态输出反馈协议的多智能体系统同步[J]. 浙江大学学报(工学版), 2020, 54(7): 1308-1315.
[9]	关旭东,周瑾,金超武,徐园平. 重载磁悬浮轴承-转子自适应控制性能[J]. 浙江大学学报(工学版), 2020, 54(4): 662-670.
[10]	张铁,肖蒙,邹焱飚,肖佳栋. 基于强化学习的机器人曲面恒力跟踪研究[J]. 浙江大学学报(工学版), 2019, 53(10): 1865-1873.
[11]	张燕,王建宙,李威,王婕,陈玲玲,杨鹏. 基于数据驱动的膝关节外骨骼控制[J]. 浙江大学学报(工学版), 2019, 53(10): 2024-2033.
[12]	赵杰梅, 胡忠辉. 基于动态反馈的AUV水平面路径跟踪控制[J]. 浙江大学学报(工学版), 2018, 52(8): 1467-1473.
[13]	陶国良,左赫,刘昊. 气动肌肉-气缸并联平台结构设计及位姿控制[J]. 浙江大学学报(工学版), 2015, 49(5): 821-828.
[14]	董如良, 杨强, 颜文俊. 多智能体协同寻优的主动配网动态拓扑重构[J]. 浙江大学学报(工学版), 2015, 49(10): 1982-1989.
[15]	朱雅光, 金波, 李伟. 基于自适应-模糊控制的六足机器人单腿柔顺控制[J]. 浙江大学学报(工学版), 2014, 48(8): 1419-1426.

Viewed

Full text

Abstract

Cited

Shared

Discussed