Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (12): 2524-2532    DOI: 10.3785/j.issn.1008-973X.2023.12.019
交通工程     
结合领域经验的深度强化学习信号控制方法
张萌(),王殿海,金盛*()
浙江大学 建筑工程学院,浙江 杭州 310058
Deep reinforcement learning approach to signal control combined with domain experience
Meng ZHANG(),Dian-hai WANG,Sheng JIN*()
College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
 全文: PDF(1262 KB)   HTML
摘要:

针对深度强化学习信号控制方法存在训练不稳定、收敛慢以及相位频繁改变的问题,基于双决斗深度Q网络(3DQN)算法引入预训练模块和相位绿灯时间计算模块,提出结合领域经验的信号控制方法. 通过优化双重Q学习损失、监督式边际分类损失和正则化损失,使预训练模块引导3DQN智能体模仿Max-Pressure方法的策略,以稳定并加快智能体的训练过程. 相位绿灯时间计算模块基于平均车头时距和排队长度动态调整相位绿灯时间以减少绿灯损失. 以杭州市萧山区机场城市大道和博奥路交叉口为例,在仿真平台SUMO上对所提方法进行验证. 实验结果表明,所提方法能有效改进传统3DQN算法的训练速度. 相比于传统控制方法,所提方法明显缩短了车辆平均旅行时间,提高了交叉口运行效率.

关键词: 交通信号控制强化学习深度强化学习监督学习预训练    
Abstract:

To address the problems of unstable training, slow convergence and frequent phase changes of signal control methods based on deep reinforcement learning, a signal control method that integrates domain expertise was proposed by incorporating a pre-training module and a phase green time calculation module based on the double-dueling deep Q network (3DQN) algorithm. The pre-training module was introduced to guide the 3DQN agent to mimic the strategy of Max-Pressure method by optimizing the dual Q learning loss, supervised marginal classification loss and regularization loss, whereby the training process was stabilized and accelerated. The phase green light time calculation module dynamically adjusted the phase green light time to reduce green light loss based on the average time headway and queue length of the current phase. The intersection of Airport City Avenue and Boao Road in Xiaoshan District, Hangzhou was used as an example to verify the algorithm on the simulation platform SUMO. The simulation test results show that the proposed method can not only effectively improve the training speed of the traditional 3DQN algorithm, but also significantly reduce the average vehicle travel time and improve the intersection operation efficiency compared with the traditional control method.

Key words: traffic signal control    reinforcement learning    deep reinforcement learning    supervised learning    pre-training
收稿日期: 2023-03-16 出版日期: 2023-12-27
CLC:  U 491.4  
基金资助: 国家自然科学基金资助项目(52131202,52072340,71901193);浙江省杰出青年科学基金资助项目(LR23E080002)
通讯作者: 金盛     E-mail: 22112093@zju.edu.cn;jinsheng@zju.edu.cn
作者简介: 张萌(1998—),男,硕士生,从事交通信息工程与控制研究. orcid.org/0000-0003-3270-1920. E-mail: 22112093@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
张萌
王殿海
金盛

引用本文:

张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.

Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.12.019        https://www.zjujournals.com/eng/CN/Y2023/V57/I12/2524

图 1  深度强化学习模型示意图
图 2  状态向量表
图 3  交叉口相位方案集合
图 4  基于双决斗深度Q网络的信号控制模型框架
图 5  交叉口全天的流量分布图
车辆类型 N P/%
小轿车 2 606 63.07
小客车 1 440 34.85
大客车 22 0.53
公交车 5 0.12
出租车 7 0.17
货车 48 1.16
工程车 4 0.10
表 1  高峰时期交叉口过车车辆类型分布
车辆类型 L/m D
小轿车 5.0 Normc (1,0.1,0.5,2)
小客车 6.5 Normc (1,0.1,0.5,2)
大客车 14.0 Normc (1,0.05,0.5,2)
公交车 12.0 Normc (1,0.05,0.5,2)
出租车 5.0 Normc (1,0.1,0.5,2)
货车 7.1 Normc (1,0.05,0.5,2)
工程车 16.5 Normc (1,0.05,0.5,2)
表 2  仿真实验车辆类型设置
算法 tw/s tt/s v/(m·s?1)
Webster 35.89 113.46 7.45
Actuated 33.77 110.04 7.70
Delay-Based 64.85 142.52 6.36
3DQN 15.01 90.75 8.96
本研究 13.00 87.39 9.30
表 3  不同方法的控制效果对比
m
算法 Ln Ls Le Lw
Webster 18.73 33.02 37.06 10.77
Actuated 16.66 26.11 22.85 9.70
Delay-Based 21.58 68.22 62.01 14.50
3DQN 10.53 19.81 19.85 8.29
本研究 9.90 17.80 16.81 9.32
表 4  不同方法下各进口道的平均排队长度
图 6  不同方法下路网内车辆平均等待时间变化
图 7  不同方法下早高峰时段一周的平均旅行时间变化
图 8  加入预训练模块与未加入预训练模块的双决斗深度Q网络算法收敛速度情况
图 9  加入相位持续时间模块的双决斗深度Q网络(3DQN)算法与传统3DQN算法在平均旅行时间上的收敛情况
图 10  不同间隔时间设置下的模型动作策略示意图
图 11  不同双决斗深度Q网络算法的各相位绿灯总时长对比图
1 WEBSTER F V. Traffic signal settings [R]. London: Road Research Laboratory, 1958.
2 罗小芹, 王殿海, 金盛 面向混合交通的感应式交通信号控制方法[J]. 吉林大学学报: 工学版, 2019, 49 (3): 695- 704
LUO Xiao-qin, WANG Dian-hai, JIN Sheng Traffic signal actuated control at isolated intersections for heterogeneous traffic[J]. Journal of Jilin University: Engineering and Technology Edition, 2019, 49 (3): 695- 704
3 HUNT P, ROBERTSON D, BRETHERTON R, et al The SCOOT on-line traffic signal optimisation technique[J]. Traffic Engineering and Control, 1982, 23 (4): 190- 192
4 GENDERS W, RAZAVI S. Using a deep reinforcement learning agent for traffic signal control [EB/OL]. (2016-11-03) [2023-03-12]. https://arxiv.org/pdf/1611.01142v1.pdf.
5 LI L, LV Y, WANG F Y Traffic signal timing via deep reinforcement learning[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3 (3): 247- 254
doi: 10.1109/JAS.2016.7508798
6 GAO J, SHEN Y, LIU J, et al. Adaptive traffic signal control: deep reinforcement learning algorithm with experience replay and target network [EB/OL]. (2017-05-08) [2023-03-12]. https://arxiv.org/pdf/1705.02755.pdf.
7 MOUSAVI S S, SCHUKAT M, HOWLEY E Traffic light control using deep policy-gradient and value-function-based reinforcement learning[J]. IET Intelligent Transport Systems, 2017, 11 (7): 417- 423
doi: 10.1049/iet-its.2017.0153
8 WEI H, ZHENG G, YAO H, et al. IntelliLight: a reinforcement learning approach for intelligent traffic light control [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2018: 2496-2505.
9 LIANG X, DU X, WANG G, et al A deep reinforcement learning network for traffic light cycle control[J]. IEEE Transactions on Vehicular Technology, 2019, 68 (2): 1243- 1253
doi: 10.1109/TVT.2018.2890726
10 WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning [C]// International Conference on Machine Learning. [S.l.]: Journal of Machine Learning Research, 2016: 1995-2003.
11 孙浩, 陈春林, 刘琼, 等 基于深度强化学习的交通信号控制方法[J]. 计算机科学, 2020, 47 (2): 169- 174
SUN Hao, CHEN Chun-lin, LIU Qiong, et al Traffic signal control method based on deep reinforcement learning[J]. Computer Science, 2020, 47 (2): 169- 174
12 刘志, 曹诗鹏, 沈阳, 等 基于改进深度强化学习方法的单交叉口信号控制[J]. 计算机科学, 2020, 47 (12): 226- 232
LIU Zhi, CAO Shi-peng, SHEN Yang, et al Signal control of single intersection based on improved deep reinforcement learning method[J]. Computer Science, 2020, 47 (12): 226- 232
13 刘智敏, 叶宝林, 朱耀东, 等 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报: 工学版, 2022, 56 (6): 1249- 1256
LIU Zhi-min, YE Bao-lin, ZHU Yao-dong, et al Traffic signal control method based on deep reinforcement learning[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (6): 1249- 1256
14 赵乾, 张灵, 赵刚, 等 双环相位结构约束下的强化学习交通信号控制方法[J]. 交通运输工程与信息学报, 2023, 21 (1): 19- 28
ZHAO Qian, ZHANG Ling, ZHAO Gang, et al Reinforcement learning traffic signal control under double-loop phase-structure constraints[J]. Journal of Transportation Engineering and Information, 2023, 21 (1): 19- 28
doi: 10.19961/j.cnki.1672-4747.2022.05.010
15 CHU T, WANG J, CODECÀ L, et al Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095
16 LI Z, YU H, ZHANG G, et al Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning[J]. Transportation Research Part C: Emerging Technologies, 2021, 125: 103059
doi: 10.1016/j.trc.2021.103059
17 ZHENG G, ZANG X, XU N, et al. Diagnosing reinforcement learning for traffic signal control [EB/OL]. (2019-05-12) [2023-03-12]. https://arxiv.org/pdf/1905.04716.pdf.
18 WEI H, CHEN C, ZHENG G, et al. PressLight: learning max pressure control to coordinate traffic signals in arterial network [C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2019: 1290-1298.
19 HESTER T, VECERIK M, PIETQUIN O, et al. Deep Q-learning from demonstrations [C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. [S.l.]: AAAI Press, 2018: 3223-3230.
20 VARAIYA P. The max-pressure controller for arbitrary networks of signalized intersections [M]// UKKUSURI S, OZBAY K. Advances in dynamic network modeling in complex transportation systems. [S.l.]: Springer, 2013: 27-66.
[1] 薛雅丽,叶金泽,李寒雁. 基于改进强化学习的多智能体追逃对抗[J]. 浙江大学学报(工学版), 2023, 57(8): 1479-1486.
[2] 刘半藤,叶赞挺,秦海龙,王柯,郑启航,王章权. 基于距离度量损失框架的半监督学习方法[J]. 浙江大学学报(工学版), 2023, 57(4): 744-752.
[3] 徐少铭,李钰,袁晴龙. 基于强化学习和3σ准则的组合剪枝方法[J]. 浙江大学学报(工学版), 2023, 57(3): 486-494.
[4] 周天琪,杨艳,张继杰,殷少伟,郭增强. 基于无负样本损失和自适应增强的图对比学习[J]. 浙江大学学报(工学版), 2023, 57(2): 259-266.
[5] 刘超,孔兵,杜国王,周丽华,陈红梅,包崇明. 高阶互信息最大化与伪标签指导的深度聚类[J]. 浙江大学学报(工学版), 2023, 57(2): 299-309.
[6] 郭万金,赵伍端,利乾辉,赵立军,曹雏清. 基于集成概率模型的变阻抗机器人打磨力控制[J]. 浙江大学学报(工学版), 2023, 57(12): 2356-2366.
[7] 姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[8] 杨天乐,李玲霞,张为. 基于自注意力机制的双分支密集人群计数算法[J]. 浙江大学学报(工学版), 2023, 57(10): 1955-1965.
[9] 华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[10] 刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[11] 徐小高,夏莹杰,朱思雨,邝砾. 基于强化学习的多路口可变车道协同控制方法[J]. 浙江大学学报(工学版), 2022, 56(5): 987-994, 1005.
[12] 李广龙,申德荣,聂铁铮,寇月. 数据库外基于多模型的学习式查询优化方法[J]. 浙江大学学报(工学版), 2022, 56(2): 288-296.
[13] 邓齐林,鲁娟,陈勇辉,冯健,廖小平,马俊燕. 基于深度强化学习的数控铣削加工参数优化方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2145-2155.
[14] 杨军,李金泰,高志明. 无监督的三维模型簇对应关系协同计算[J]. 浙江大学学报(工学版), 2022, 56(10): 1935-1947.
[15] 张鹏,田子都,王浩. 基于改进生成对抗网络的飞参数据异常检测方法[J]. 浙江大学学报(工学版), 2022, 56(10): 1967-1976.