Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (12): 2524-2532    DOI: 10.3785/j.issn.1008-973X.2023.12.019
    
Deep reinforcement learning approach to signal control combined with domain experience
Meng ZHANG(),Dian-hai WANG,Sheng JIN*()
College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
Download: HTML     PDF(1262KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

To address the problems of unstable training, slow convergence and frequent phase changes of signal control methods based on deep reinforcement learning, a signal control method that integrates domain expertise was proposed by incorporating a pre-training module and a phase green time calculation module based on the double-dueling deep Q network (3DQN) algorithm. The pre-training module was introduced to guide the 3DQN agent to mimic the strategy of Max-Pressure method by optimizing the dual Q learning loss, supervised marginal classification loss and regularization loss, whereby the training process was stabilized and accelerated. The phase green light time calculation module dynamically adjusted the phase green light time to reduce green light loss based on the average time headway and queue length of the current phase. The intersection of Airport City Avenue and Boao Road in Xiaoshan District, Hangzhou was used as an example to verify the algorithm on the simulation platform SUMO. The simulation test results show that the proposed method can not only effectively improve the training speed of the traditional 3DQN algorithm, but also significantly reduce the average vehicle travel time and improve the intersection operation efficiency compared with the traditional control method.



Key wordstraffic signal control      reinforcement learning      deep reinforcement learning      supervised learning      pre-training     
Received: 16 March 2023      Published: 27 December 2023
CLC:  U 491.4  
Fund:  国家自然科学基金资助项目(52131202,52072340,71901193);浙江省杰出青年科学基金资助项目(LR23E080002)
Corresponding Authors: Sheng JIN     E-mail: 22112093@zju.edu.cn;jinsheng@zju.edu.cn
Cite this article:

Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.12.019     OR     https://www.zjujournals.com/eng/Y2023/V57/I12/2524


结合领域经验的深度强化学习信号控制方法

针对深度强化学习信号控制方法存在训练不稳定、收敛慢以及相位频繁改变的问题,基于双决斗深度Q网络(3DQN)算法引入预训练模块和相位绿灯时间计算模块,提出结合领域经验的信号控制方法. 通过优化双重Q学习损失、监督式边际分类损失和正则化损失,使预训练模块引导3DQN智能体模仿Max-Pressure方法的策略,以稳定并加快智能体的训练过程. 相位绿灯时间计算模块基于平均车头时距和排队长度动态调整相位绿灯时间以减少绿灯损失. 以杭州市萧山区机场城市大道和博奥路交叉口为例,在仿真平台SUMO上对所提方法进行验证. 实验结果表明,所提方法能有效改进传统3DQN算法的训练速度. 相比于传统控制方法,所提方法明显缩短了车辆平均旅行时间,提高了交叉口运行效率.


关键词: 交通信号控制,  强化学习,  深度强化学习,  监督学习,  预训练 
Fig.1 Schematic diagram of deep reinforcement learning model
Fig.2 State vector scale
Fig.3 Combination diagram of intersection phase scheme
Fig.4 Signal control model framework based on double-dueling deep Q network
Fig.5 Flow distribution diagram of intersections throughout day
车辆类型 N P/%
小轿车 2 606 63.07
小客车 1 440 34.85
大客车 22 0.53
公交车 5 0.12
出租车 7 0.17
货车 48 1.16
工程车 4 0.10
Tab.1 Distribution of passing vehicle types at intersections during peak hours
车辆类型 L/m D
小轿车 5.0 Normc (1,0.1,0.5,2)
小客车 6.5 Normc (1,0.1,0.5,2)
大客车 14.0 Normc (1,0.05,0.5,2)
公交车 12.0 Normc (1,0.05,0.5,2)
出租车 5.0 Normc (1,0.1,0.5,2)
货车 7.1 Normc (1,0.05,0.5,2)
工程车 16.5 Normc (1,0.05,0.5,2)
Tab.2 Simulation experiment vehicle type setting
算法 tw/s tt/s v/(m·s?1)
Webster 35.89 113.46 7.45
Actuated 33.77 110.04 7.70
Delay-Based 64.85 142.52 6.36
3DQN 15.01 90.75 8.96
本研究 13.00 87.39 9.30
Tab.3 Comparison of control effects among different methods
m
算法 Ln Ls Le Lw
Webster 18.73 33.02 37.06 10.77
Actuated 16.66 26.11 22.85 9.70
Delay-Based 21.58 68.22 62.01 14.50
3DQN 10.53 19.81 19.85 8.29
本研究 9.90 17.80 16.81 9.32
Tab.4 Average queue length of each approach under different methods
Fig.6 Variation of average waiting time for vehicles in road network under different methods
Fig.7 Average travel time during morning peak hours for one week under different methods
Fig.8 Convergence speed comparison of double-dueling deep Q network algorithms with and without pretrained module
Fig.9 Convergence analysis of double-dueling deep Q network (3DQN) algorithms with phase duration module and traditional 3DQN algorithm on average travel time
Fig.10 Schematic diagram of model action strategy under different intervals settings
Fig.11 Comparison diagram of total green light duration for each phase with different double-dueling deep Q network algorithms
[1]   WEBSTER F V. Traffic signal settings [R]. London: Road Research Laboratory, 1958.
[2]   罗小芹, 王殿海, 金盛 面向混合交通的感应式交通信号控制方法[J]. 吉林大学学报: 工学版, 2019, 49 (3): 695- 704
LUO Xiao-qin, WANG Dian-hai, JIN Sheng Traffic signal actuated control at isolated intersections for heterogeneous traffic[J]. Journal of Jilin University: Engineering and Technology Edition, 2019, 49 (3): 695- 704
[3]   HUNT P, ROBERTSON D, BRETHERTON R, et al The SCOOT on-line traffic signal optimisation technique[J]. Traffic Engineering and Control, 1982, 23 (4): 190- 192
[4]   GENDERS W, RAZAVI S. Using a deep reinforcement learning agent for traffic signal control [EB/OL]. (2016-11-03) [2023-03-12]. https://arxiv.org/pdf/1611.01142v1.pdf.
[5]   LI L, LV Y, WANG F Y Traffic signal timing via deep reinforcement learning[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3 (3): 247- 254
doi: 10.1109/JAS.2016.7508798
[6]   GAO J, SHEN Y, LIU J, et al. Adaptive traffic signal control: deep reinforcement learning algorithm with experience replay and target network [EB/OL]. (2017-05-08) [2023-03-12]. https://arxiv.org/pdf/1705.02755.pdf.
[7]   MOUSAVI S S, SCHUKAT M, HOWLEY E Traffic light control using deep policy-gradient and value-function-based reinforcement learning[J]. IET Intelligent Transport Systems, 2017, 11 (7): 417- 423
doi: 10.1049/iet-its.2017.0153
[8]   WEI H, ZHENG G, YAO H, et al. IntelliLight: a reinforcement learning approach for intelligent traffic light control [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2018: 2496-2505.
[9]   LIANG X, DU X, WANG G, et al A deep reinforcement learning network for traffic light cycle control[J]. IEEE Transactions on Vehicular Technology, 2019, 68 (2): 1243- 1253
doi: 10.1109/TVT.2018.2890726
[10]   WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning [C]// International Conference on Machine Learning. [S.l.]: Journal of Machine Learning Research, 2016: 1995-2003.
[11]   孙浩, 陈春林, 刘琼, 等 基于深度强化学习的交通信号控制方法[J]. 计算机科学, 2020, 47 (2): 169- 174
SUN Hao, CHEN Chun-lin, LIU Qiong, et al Traffic signal control method based on deep reinforcement learning[J]. Computer Science, 2020, 47 (2): 169- 174
[12]   刘志, 曹诗鹏, 沈阳, 等 基于改进深度强化学习方法的单交叉口信号控制[J]. 计算机科学, 2020, 47 (12): 226- 232
LIU Zhi, CAO Shi-peng, SHEN Yang, et al Signal control of single intersection based on improved deep reinforcement learning method[J]. Computer Science, 2020, 47 (12): 226- 232
[13]   刘智敏, 叶宝林, 朱耀东, 等 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报: 工学版, 2022, 56 (6): 1249- 1256
LIU Zhi-min, YE Bao-lin, ZHU Yao-dong, et al Traffic signal control method based on deep reinforcement learning[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (6): 1249- 1256
[14]   赵乾, 张灵, 赵刚, 等 双环相位结构约束下的强化学习交通信号控制方法[J]. 交通运输工程与信息学报, 2023, 21 (1): 19- 28
ZHAO Qian, ZHANG Ling, ZHAO Gang, et al Reinforcement learning traffic signal control under double-loop phase-structure constraints[J]. Journal of Transportation Engineering and Information, 2023, 21 (1): 19- 28
doi: 10.19961/j.cnki.1672-4747.2022.05.010
[15]   CHU T, WANG J, CODECÀ L, et al Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095
[16]   LI Z, YU H, ZHANG G, et al Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning[J]. Transportation Research Part C: Emerging Technologies, 2021, 125: 103059
doi: 10.1016/j.trc.2021.103059
[17]   ZHENG G, ZANG X, XU N, et al. Diagnosing reinforcement learning for traffic signal control [EB/OL]. (2019-05-12) [2023-03-12]. https://arxiv.org/pdf/1905.04716.pdf.
[18]   WEI H, CHEN C, ZHENG G, et al. PressLight: learning max pressure control to coordinate traffic signals in arterial network [C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2019: 1290-1298.
[19]   HESTER T, VECERIK M, PIETQUIN O, et al. Deep Q-learning from demonstrations [C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. [S.l.]: AAAI Press, 2018: 3223-3230.
[20]   VARAIYA P. The max-pressure controller for arbitrary networks of signalized intersections [M]// UKKUSURI S, OZBAY K. Advances in dynamic network modeling in complex transportation systems. [S.l.]: Springer, 2013: 27-66.
[1] Ya-li XUE,Jin-ze YE,Han-yan LI. Multi-agent pursuit and evasion games based on improved reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1479-1486.
[2] Ban-teng LIU,Zan-ting YE,Hai-long QIN,Ke WANG,Qi-hang ZHENG,Zhang-quan WANG. Semi-supervised learning method based on distance metric loss framework[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 744-752.
[3] Shao-ming XU,Yu LI,Qing-long YUAN. Combination pruning method based on reinforcement learning and 3σ criterion[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 486-494.
[4] Tian-qi ZHOU,Yan YANG,Ji-jie ZHANG,Shao-wei YIN,Zeng-qiang GUO. Graph contrastive learning based on negative-sample-free loss and adaptive augmentation[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 259-266.
[5] Chao LIU,Bing KONG,Guo-wang DU,Li-hua ZHOU,Hong-mei CHEN,Chong-ming BAO. Deep clustering via high-order mutual information maximization and pseudo-label guidance[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 299-309.
[6] Yu-feng JIANG,Dong-sheng CHEN. Assembly strategy for large-diameter peg-in-hole based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2210-2216.
[7] Tian-le YANG,Ling-xia LI,Wei ZHANG. Dual-branch crowd counting algorithm based on self-attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1955-1965.
[8] Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.
[9] Zhi-min LIU,Bao-Lin YE,Yao-dong ZHU,Qing YAO,Wei-min WU. Traffic signal control method based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1249-1256.
[10] Xiao-gao XU,Ying-jie XIA,Si-yu ZHU,Li KUANG. Cooperative control algorithm of multi-intersection variable-direction lanes based on reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 987-994, 1005.
[11] Guang-long LI,De-rong SHEN,Tie-zheng NIE,Yue KOU. Learning query optimization method based on multi model outside database[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 288-296.
[12] Qi-lin DENG,Juan LU,Yong-hui CHEN,Jian FENG,Xiao-ping LIAO,Jun-yan MA. Optimization method of CNC milling parameters based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2145-2155.
[13] Jun YANG,Jin-tai LI,Zhi-ming GAO. Unsupervised co-calculation on correspondence of three-dimensional shape collections[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1935-1947.
[14] Peng ZHANG,Zi-du TIAN,Hao WANG. Flight parameter data anomaly detection method based on improved generative adversarial network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1967-1976.
[15] Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Agile imaging satellite task planning method for intensive observation[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1215-1224.