结合领域经验的深度强化学习信号控制方法

doi:10.3785/j.issn.1008-973X.2023.12.019

浙江大学学报(工学版)

2023, Vol. 57

Issue (12): 2524-2532 DOI: 10.3785/j.issn.1008-973X.2023.12.019

交通工程

结合领域经验的深度强化学习信号控制方法

张萌(

),王殿海,金盛*(

)

浙江大学建筑工程学院，浙江杭州 310058

Deep reinforcement learning approach to signal control combined with domain experience

Meng ZHANG(

),Dian-hai WANG,Sheng JIN*(

)

College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China

全文: PDF(1262 KB) HTML

摘要：

针对深度强化学习信号控制方法存在训练不稳定、收敛慢以及相位频繁改变的问题，基于双决斗深度Q网络(3DQN)算法引入预训练模块和相位绿灯时间计算模块，提出结合领域经验的信号控制方法. 通过优化双重Q学习损失、监督式边际分类损失和正则化损失，使预训练模块引导3DQN智能体模仿Max-Pressure方法的策略，以稳定并加快智能体的训练过程. 相位绿灯时间计算模块基于平均车头时距和排队长度动态调整相位绿灯时间以减少绿灯损失. 以杭州市萧山区机场城市大道和博奥路交叉口为例，在仿真平台SUMO上对所提方法进行验证. 实验结果表明，所提方法能有效改进传统3DQN算法的训练速度. 相比于传统控制方法，所提方法明显缩短了车辆平均旅行时间，提高了交叉口运行效率.

关键词： 交通信号控制; 强化学习; 深度强化学习; 监督学习; 预训练

Abstract:

To address the problems of unstable training, slow convergence and frequent phase changes of signal control methods based on deep reinforcement learning, a signal control method that integrates domain expertise was proposed by incorporating a pre-training module and a phase green time calculation module based on the double-dueling deep Q network (3DQN) algorithm. The pre-training module was introduced to guide the 3DQN agent to mimic the strategy of Max-Pressure method by optimizing the dual Q learning loss, supervised marginal classification loss and regularization loss, whereby the training process was stabilized and accelerated. The phase green light time calculation module dynamically adjusted the phase green light time to reduce green light loss based on the average time headway and queue length of the current phase. The intersection of Airport City Avenue and Boao Road in Xiaoshan District, Hangzhou was used as an example to verify the algorithm on the simulation platform SUMO. The simulation test results show that the proposed method can not only effectively improve the training speed of the traditional 3DQN algorithm, but also significantly reduce the average vehicle travel time and improve the intersection operation efficiency compared with the traditional control method.

Key words: traffic signal control reinforcement learning deep reinforcement learning supervised learning pre-training

收稿日期: 2023-03-16 出版日期: 2023-12-27

CLC:

U 491.4

基金资助: 国家自然科学基金资助项目（52131202，52072340，71901193）；浙江省杰出青年科学基金资助项目（LR23E080002）

通讯作者: 金盛 E-mail: 22112093@zju.edu.cn;jinsheng@zju.edu.cn

作者简介: 张萌（1998—），男，硕士生，从事交通信息工程与控制研究. orcid.org/0000-0003-3270-1920. E-mail： 22112093@zju.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	张萌
	王殿海
	金盛

引用本文:

张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.

Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.12.019 或 https://www.zjujournals.com/eng/CN/Y2023/V57/I12/2524

图 1 深度强化学习模型示意图

图 2 状态向量表

图 3 交叉口相位方案集合

图 4 基于双决斗深度Q网络的信号控制模型框架

图 5 交叉口全天的流量分布图

表 1 高峰时期交叉口过车车辆类型分布

表 2 仿真实验车辆类型设置

表 3 不同方法的控制效果对比

表 4 不同方法下各进口道的平均排队长度

图 6 不同方法下路网内车辆平均等待时间变化

图 7 不同方法下早高峰时段一周的平均旅行时间变化

图 8 加入预训练模块与未加入预训练模块的双决斗深度Q网络算法收敛速度情况

图 9 加入相位持续时间模块的双决斗深度Q网络(3DQN)算法与传统3DQN算法在平均旅行时间上的收敛情况

图 10 不同间隔时间设置下的模型动作策略示意图

图 11 不同双决斗深度Q网络算法的各相位绿灯总时长对比图

1	WEBSTER F V. Traffic signal settings [R]. London: Road Research Laboratory, 1958.
2	罗小芹, 王殿海, 金盛面向混合交通的感应式交通信号控制方法[J]. 吉林大学学报: 工学版, 2019, 49 (3): 695- 704 LUO Xiao-qin, WANG Dian-hai, JIN Sheng Traffic signal actuated control at isolated intersections for heterogeneous traffic[J]. Journal of Jilin University: Engineering and Technology Edition, 2019, 49 (3): 695- 704
3	HUNT P, ROBERTSON D, BRETHERTON R, et al The SCOOT on-line traffic signal optimisation technique[J]. Traffic Engineering and Control, 1982, 23 (4): 190- 192
4	GENDERS W, RAZAVI S. Using a deep reinforcement learning agent for traffic signal control [EB/OL]. (2016-11-03) [2023-03-12]. https://arxiv.org/pdf/1611.01142v1.pdf.
5	LI L, LV Y, WANG F Y Traffic signal timing via deep reinforcement learning[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3 (3): 247- 254 doi: 10.1109/JAS.2016.7508798
6	GAO J, SHEN Y, LIU J, et al. Adaptive traffic signal control: deep reinforcement learning algorithm with experience replay and target network [EB/OL]. (2017-05-08) [2023-03-12]. https://arxiv.org/pdf/1705.02755.pdf.
7	MOUSAVI S S, SCHUKAT M, HOWLEY E Traffic light control using deep policy-gradient and value-function-based reinforcement learning[J]. IET Intelligent Transport Systems, 2017, 11 (7): 417- 423 doi: 10.1049/iet-its.2017.0153
8	WEI H, ZHENG G, YAO H, et al. IntelliLight: a reinforcement learning approach for intelligent traffic light control [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2018: 2496-2505.
9	LIANG X, DU X, WANG G, et al A deep reinforcement learning network for traffic light cycle control[J]. IEEE Transactions on Vehicular Technology, 2019, 68 (2): 1243- 1253 doi: 10.1109/TVT.2018.2890726
10	WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning [C]// International Conference on Machine Learning. [S.l.]: Journal of Machine Learning Research, 2016: 1995-2003.
11	孙浩, 陈春林, 刘琼, 等基于深度强化学习的交通信号控制方法[J]. 计算机科学, 2020, 47 (2): 169- 174 SUN Hao, CHEN Chun-lin, LIU Qiong, et al Traffic signal control method based on deep reinforcement learning[J]. Computer Science, 2020, 47 (2): 169- 174
12	刘志, 曹诗鹏, 沈阳, 等基于改进深度强化学习方法的单交叉口信号控制[J]. 计算机科学, 2020, 47 (12): 226- 232 LIU Zhi, CAO Shi-peng, SHEN Yang, et al Signal control of single intersection based on improved deep reinforcement learning method[J]. Computer Science, 2020, 47 (12): 226- 232
13	刘智敏, 叶宝林, 朱耀东, 等基于深度强化学习的交通信号控制方法[J]. 浙江大学学报: 工学版, 2022, 56 (6): 1249- 1256 LIU Zhi-min, YE Bao-lin, ZHU Yao-dong, et al Traffic signal control method based on deep reinforcement learning[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (6): 1249- 1256
14	赵乾, 张灵, 赵刚, 等双环相位结构约束下的强化学习交通信号控制方法[J]. 交通运输工程与信息学报, 2023, 21 (1): 19- 28 ZHAO Qian, ZHANG Ling, ZHAO Gang, et al Reinforcement learning traffic signal control under double-loop phase-structure constraints[J]. Journal of Transportation Engineering and Information, 2023, 21 (1): 19- 28 doi: 10.19961/j.cnki.1672-4747.2022.05.010
15	CHU T, WANG J, CODECÀ L, et al Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095
16	LI Z, YU H, ZHANG G, et al Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning[J]. Transportation Research Part C: Emerging Technologies, 2021, 125: 103059 doi: 10.1016/j.trc.2021.103059
17	ZHENG G, ZANG X, XU N, et al. Diagnosing reinforcement learning for traffic signal control [EB/OL]. (2019-05-12) [2023-03-12]. https://arxiv.org/pdf/1905.04716.pdf.
18	WEI H, CHEN C, ZHENG G, et al. PressLight: learning max pressure control to coordinate traffic signals in arterial network [C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2019: 1290-1298.
19	HESTER T, VECERIK M, PIETQUIN O, et al. Deep Q-learning from demonstrations [C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. [S.l.]: AAAI Press, 2018: 3223-3230.
20	VARAIYA P. The max-pressure controller for arbitrary networks of signalized intersections [M]// UKKUSURI S, OZBAY K. Advances in dynamic network modeling in complex transportation systems. [S.l.]: Springer, 2013: 27-66.

[1]	薛雅丽,叶金泽,李寒雁. 基于改进强化学习的多智能体追逃对抗[J]. 浙江大学学报(工学版), 2023, 57(8): 1479-1486.
[2]	刘半藤,叶赞挺,秦海龙,王柯,郑启航,王章权. 基于距离度量损失框架的半监督学习方法[J]. 浙江大学学报(工学版), 2023, 57(4): 744-752.
[3]	徐少铭,李钰,袁晴龙. 基于强化学习和3σ准则的组合剪枝方法[J]. 浙江大学学报(工学版), 2023, 57(3): 486-494.
[4]	周天琪,杨艳,张继杰,殷少伟,郭增强. 基于无负样本损失和自适应增强的图对比学习[J]. 浙江大学学报(工学版), 2023, 57(2): 259-266.
[5]	刘超,孔兵,杜国王,周丽华,陈红梅,包崇明. 高阶互信息最大化与伪标签指导的深度聚类[J]. 浙江大学学报(工学版), 2023, 57(2): 299-309.
[6]	郭万金,赵伍端,利乾辉,赵立军,曹雏清. 基于集成概率模型的变阻抗机器人打磨力控制[J]. 浙江大学学报(工学版), 2023, 57(12): 2356-2366.
[7]	姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[8]	杨天乐,李玲霞,张为. 基于自注意力机制的双分支密集人群计数算法[J]. 浙江大学学报(工学版), 2023, 57(10): 1955-1965.
[9]	华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[10]	刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[11]	徐小高,夏莹杰,朱思雨,邝砾. 基于强化学习的多路口可变车道协同控制方法[J]. 浙江大学学报(工学版), 2022, 56(5): 987-994, 1005.
[12]	李广龙,申德荣,聂铁铮,寇月. 数据库外基于多模型的学习式查询优化方法[J]. 浙江大学学报(工学版), 2022, 56(2): 288-296.
[13]	邓齐林,鲁娟,陈勇辉,冯健,廖小平,马俊燕. 基于深度强化学习的数控铣削加工参数优化方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2145-2155.
[14]	杨军,李金泰,高志明. 无监督的三维模型簇对应关系协同计算[J]. 浙江大学学报(工学版), 2022, 56(10): 1935-1947.
[15]	张鹏,田子都,王浩. 基于改进生成对抗网络的飞参数据异常检测方法[J]. 浙江大学学报(工学版), 2022, 56(10): 1967-1976.

Viewed

Full text

Abstract

Cited

Shared

Discussed