Deep reinforcement learning approach to signal control combined with domain experience

doi:10.3785/j.issn.1008-973X.2023.12.019

Journal of ZheJiang University (Engineering Science)

2023, Vol. 57

Issue (12): 2524-2532 DOI: 10.3785/j.issn.1008-973X.2023.12.019

Deep reinforcement learning approach to signal control combined with domain experience

Meng ZHANG(

),Dian-hai WANG,Sheng JIN*(

)

College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China

Download:

HTML

PDF(1262KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

To address the problems of unstable training, slow convergence and frequent phase changes of signal control methods based on deep reinforcement learning, a signal control method that integrates domain expertise was proposed by incorporating a pre-training module and a phase green time calculation module based on the double-dueling deep Q network (3DQN) algorithm. The pre-training module was introduced to guide the 3DQN agent to mimic the strategy of Max-Pressure method by optimizing the dual Q learning loss, supervised marginal classification loss and regularization loss, whereby the training process was stabilized and accelerated. The phase green light time calculation module dynamically adjusted the phase green light time to reduce green light loss based on the average time headway and queue length of the current phase. The intersection of Airport City Avenue and Boao Road in Xiaoshan District, Hangzhou was used as an example to verify the algorithm on the simulation platform SUMO. The simulation test results show that the proposed method can not only effectively improve the training speed of the traditional 3DQN algorithm, but also significantly reduce the average vehicle travel time and improve the intersection operation efficiency compared with the traditional control method.

Key words： traffic signal control reinforcement learning deep reinforcement learning supervised learning pre-training

Received: 16 March 2023 Published: 27 December 2023

CLC:

U 491.4

Fund: 国家自然科学基金资助项目（52131202，52072340，71901193）；浙江省杰出青年科学基金资助项目（LR23E080002）

Corresponding Authors: Sheng JIN E-mail: 22112093@zju.edu.cn;jinsheng@zju.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Meng ZHANG
	Dian-hai WANG
	Sheng JIN

Cite this article:

Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.12.019 OR https://www.zjujournals.com/eng/Y2023/V57/I12/2524

结合领域经验的深度强化学习信号控制方法

针对深度强化学习信号控制方法存在训练不稳定、收敛慢以及相位频繁改变的问题，基于双决斗深度Q网络(3DQN)算法引入预训练模块和相位绿灯时间计算模块，提出结合领域经验的信号控制方法. 通过优化双重Q学习损失、监督式边际分类损失和正则化损失，使预训练模块引导3DQN智能体模仿Max-Pressure方法的策略，以稳定并加快智能体的训练过程. 相位绿灯时间计算模块基于平均车头时距和排队长度动态调整相位绿灯时间以减少绿灯损失. 以杭州市萧山区机场城市大道和博奥路交叉口为例，在仿真平台SUMO上对所提方法进行验证. 实验结果表明，所提方法能有效改进传统3DQN算法的训练速度. 相比于传统控制方法，所提方法明显缩短了车辆平均旅行时间，提高了交叉口运行效率.

关键词： 交通信号控制, 强化学习, 深度强化学习, 监督学习, 预训练

Fig.1 Schematic diagram of deep reinforcement learning model

Fig.2 State vector scale

Fig.3 Combination diagram of intersection phase scheme

Fig.4 Signal control model framework based on double-dueling deep Q network

Fig.5 Flow distribution diagram of intersections throughout day

Tab.1 Distribution of passing vehicle types at intersections during peak hours

Tab.2 Simulation experiment vehicle type setting

Tab.3 Comparison of control effects among different methods

Tab.4 Average queue length of each approach under different methods

Fig.6 Variation of average waiting time for vehicles in road network under different methods

Fig.7 Average travel time during morning peak hours for one week under different methods

Fig.8 Convergence speed comparison of double-dueling deep Q network algorithms with and without pretrained module

Fig.9 Convergence analysis of double-dueling deep Q network (3DQN) algorithms with phase duration module and traditional 3DQN algorithm on average travel time

Fig.10 Schematic diagram of model action strategy under different intervals settings

Fig.11 Comparison diagram of total green light duration for each phase with different double-dueling deep Q network algorithms


[1]	WEBSTER F V. Traffic signal settings [R]. London: Road Research Laboratory, 1958.

[2]	罗小芹, 王殿海, 金盛面向混合交通的感应式交通信号控制方法[J]. 吉林大学学报: 工学版, 2019, 49 (3): 695- 704 LUO Xiao-qin, WANG Dian-hai, JIN Sheng Traffic signal actuated control at isolated intersections for heterogeneous traffic[J]. Journal of Jilin University: Engineering and Technology Edition, 2019, 49 (3): 695- 704

[3]	HUNT P, ROBERTSON D, BRETHERTON R, et al The SCOOT on-line traffic signal optimisation technique[J]. Traffic Engineering and Control, 1982, 23 (4): 190- 192

[4]	GENDERS W, RAZAVI S. Using a deep reinforcement learning agent for traffic signal control [EB/OL]. (2016-11-03) [2023-03-12]. https://arxiv.org/pdf/1611.01142v1.pdf.

[5]	LI L, LV Y, WANG F Y Traffic signal timing via deep reinforcement learning[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3 (3): 247- 254 doi: 10.1109/JAS.2016.7508798

[6]	GAO J, SHEN Y, LIU J, et al. Adaptive traffic signal control: deep reinforcement learning algorithm with experience replay and target network [EB/OL]. (2017-05-08) [2023-03-12]. https://arxiv.org/pdf/1705.02755.pdf.

[7]	MOUSAVI S S, SCHUKAT M, HOWLEY E Traffic light control using deep policy-gradient and value-function-based reinforcement learning[J]. IET Intelligent Transport Systems, 2017, 11 (7): 417- 423 doi: 10.1049/iet-its.2017.0153

[8]	WEI H, ZHENG G, YAO H, et al. IntelliLight: a reinforcement learning approach for intelligent traffic light control [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2018: 2496-2505.

[9]	LIANG X, DU X, WANG G, et al A deep reinforcement learning network for traffic light cycle control[J]. IEEE Transactions on Vehicular Technology, 2019, 68 (2): 1243- 1253 doi: 10.1109/TVT.2018.2890726

[10]	WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning [C]// International Conference on Machine Learning. [S.l.]: Journal of Machine Learning Research, 2016: 1995-2003.

[11]	孙浩, 陈春林, 刘琼, 等基于深度强化学习的交通信号控制方法[J]. 计算机科学, 2020, 47 (2): 169- 174 SUN Hao, CHEN Chun-lin, LIU Qiong, et al Traffic signal control method based on deep reinforcement learning[J]. Computer Science, 2020, 47 (2): 169- 174

[12]	刘志, 曹诗鹏, 沈阳, 等基于改进深度强化学习方法的单交叉口信号控制[J]. 计算机科学, 2020, 47 (12): 226- 232 LIU Zhi, CAO Shi-peng, SHEN Yang, et al Signal control of single intersection based on improved deep reinforcement learning method[J]. Computer Science, 2020, 47 (12): 226- 232

[13]	刘智敏, 叶宝林, 朱耀东, 等基于深度强化学习的交通信号控制方法[J]. 浙江大学学报: 工学版, 2022, 56 (6): 1249- 1256 LIU Zhi-min, YE Bao-lin, ZHU Yao-dong, et al Traffic signal control method based on deep reinforcement learning[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (6): 1249- 1256

[14]	赵乾, 张灵, 赵刚, 等双环相位结构约束下的强化学习交通信号控制方法[J]. 交通运输工程与信息学报, 2023, 21 (1): 19- 28 ZHAO Qian, ZHANG Ling, ZHAO Gang, et al Reinforcement learning traffic signal control under double-loop phase-structure constraints[J]. Journal of Transportation Engineering and Information, 2023, 21 (1): 19- 28 doi: 10.19961/j.cnki.1672-4747.2022.05.010

[15]	CHU T, WANG J, CODECÀ L, et al Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095

[16]	LI Z, YU H, ZHANG G, et al Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning[J]. Transportation Research Part C: Emerging Technologies, 2021, 125: 103059 doi: 10.1016/j.trc.2021.103059

[17]	ZHENG G, ZANG X, XU N, et al. Diagnosing reinforcement learning for traffic signal control [EB/OL]. (2019-05-12) [2023-03-12]. https://arxiv.org/pdf/1905.04716.pdf.

[18]	WEI H, CHEN C, ZHENG G, et al. PressLight: learning max pressure control to coordinate traffic signals in arterial network [C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2019: 1290-1298.

[19]	HESTER T, VECERIK M, PIETQUIN O, et al. Deep Q-learning from demonstrations [C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. [S.l.]: AAAI Press, 2018: 3223-3230.

[20]	VARAIYA P. The max-pressure controller for arbitrary networks of signalized intersections [M]// UKKUSURI S, OZBAY K. Advances in dynamic network modeling in complex transportation systems. [S.l.]: Springer, 2013: 27-66.

[1]	Ya-li XUE,Jin-ze YE,Han-yan LI. Multi-agent pursuit and evasion games based on improved reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1479-1486.

[2]	Ban-teng LIU,Zan-ting YE,Hai-long QIN,Ke WANG,Qi-hang ZHENG,Zhang-quan WANG. Semi-supervised learning method based on distance metric loss framework[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 744-752.

[3]	Shao-ming XU,Yu LI,Qing-long YUAN. Combination pruning method based on reinforcement learning and 3σ criterion[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 486-494.

[4]	Tian-qi ZHOU,Yan YANG,Ji-jie ZHANG,Shao-wei YIN,Zeng-qiang GUO. Graph contrastive learning based on negative-sample-free loss and adaptive augmentation[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 259-266.

[5]	Chao LIU,Bing KONG,Guo-wang DU,Li-hua ZHOU,Hong-mei CHEN,Chong-ming BAO. Deep clustering via high-order mutual information maximization and pseudo-label guidance[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 299-309.

[6]	Yu-feng JIANG,Dong-sheng CHEN. Assembly strategy for large-diameter peg-in-hole based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2210-2216.

[7]	Tian-le YANG,Ling-xia LI,Wei ZHANG. Dual-branch crowd counting algorithm based on self-attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1955-1965.

[8]	Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.

[9]	Zhi-min LIU,Bao-Lin YE,Yao-dong ZHU,Qing YAO,Wei-min WU. Traffic signal control method based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1249-1256.

[10]	Xiao-gao XU,Ying-jie XIA,Si-yu ZHU,Li KUANG. Cooperative control algorithm of multi-intersection variable-direction lanes based on reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 987-994, 1005.

[11]	Guang-long LI,De-rong SHEN,Tie-zheng NIE,Yue KOU. Learning query optimization method based on multi model outside database[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 288-296.

[12]	Qi-lin DENG,Juan LU,Yong-hui CHEN,Jian FENG,Xiao-ping LIAO,Jun-yan MA. Optimization method of CNC milling parameters based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2145-2155.

[13]	Jun YANG,Jin-tai LI,Zhi-ming GAO. Unsupervised co-calculation on correspondence of three-dimensional shape collections[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1935-1947.

[14]	Peng ZHANG,Zi-du TIAN,Hao WANG. Flight parameter data anomaly detection method based on improved generative adversarial network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1967-1976.

[15]	Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Agile imaging satellite task planning method for intensive observation[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1215-1224.

Viewed

Full text

Abstract

Cited

Shared

Discussed