Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (8): 1671-1680    DOI: 10.3785/j.issn.1008-973X.2024.08.014
交通工程、土木工程     
基于异步优势演员-评论家的交通信号控制方法
叶宝林1,2(),孙瑞涛1,2,吴维敏3,陈滨2,姚青4
1. 浙江理工大学 信息科学与工程学院,浙江 杭州 310018
2. 嘉兴大学 嘉兴市智慧交通重点实验室,浙江 嘉兴 314001
3. 浙江大学 工业控制技术全国重点实验室,智能系统与控制研究所,浙江 杭州 310027
4. 浙江理工大学 计算机科学与技术学院,浙江 杭州 310018
Traffic signal control method based on asynchronous advantage actor-critic
Baolin YE1,2(),Ruitao SUN1,2,Weimin WU3,Bin CHEN2,Qing YAO4
1. School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China
2. Jiaxing Key Laboratory of Smart Transportations, Jiaxing University, Jiaxing 314001, China
3. State Key Laboratory of Industrial Control Technology, Institute ofCyber-Systems and Control, Zhejiang University, Hangzhou 310027, China
4. School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
 全文: PDF(1889 KB)   HTML
摘要:

针对现有基于深度强化学习的交通信号控制方法的模型学习和决策成本高的问题,提出基于异步优势演员-评论家(A3C)算法的单交叉口交通信号控制方法. 在模型输入端分别从交叉口和车道2个不同维度构建车辆权重增益网络,对采集的车辆状态信息进行预处理. 设计新的奖励机制,提出融合车辆权重增益网络的A3C算法. 基于微观交通仿真软件SUMO的仿真测试结果表明,相比于传统的交通信号控制方法和基准强化学习方法,所提方法在低、中、高3种不同的交通流量状态下,均能够取得更好的交通信号控制效益.

关键词: 交通信号控制深度强化学习A3C权重增益网络    
Abstract:

A single intersection traffic signal control method based on the asynchronous advantage actor-critic (A3C) algorithm was proposed aiming at high cost of model learning and decision making in the existing traffic signal control methods based on deep reinforcement learning. Vehicle weight gain network was constructed from two different dimensions at the input side of the model, namely intersections and lanes, in order to preprocess the collected vehicle state information. A new reward mechanism was designed and an A3C algorithm that integrated vehicle weight gain networks was proposed. The simulation test results based on the microscopic traffic simulation software simulation of urban mobility (SUMO) show that the proposed method achieves better traffic signal control performance under three different traffic flow conditions of low, medium and high levels compared with traditional traffic signal control methods and benchmark reinforcement learning methods.

Key words: traffic signal control    deep reinforcement learning    A3C    weight gain network
收稿日期: 2023-08-26 出版日期: 2024-07-23
CLC:  TP 393  
基金资助: 国家自然科学基金资助项目(61603154);浙江省自然科学基金资助项目 ( LTGS23F030002); 嘉兴市应用性基础研究项目(2023AY11034);浙江省尖兵领雁研发攻关计划资助项目(2023C01174);工业控制技术国家重点实验室开放课题资助项目 (ICT2022B52).
作者简介: 叶宝林(1984—),男,副教授,博士,从事智能交通的研究. orcid.org/0000-0002-5369-6246.E-mail:yebaolin@zjxu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
叶宝林
孙瑞涛
吴维敏
陈滨
姚青

引用本文:

叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.

Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO. Traffic signal control method based on asynchronous advantage actor-critic. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1671-1680.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.08.014        https://www.zjujournals.com/eng/CN/Y2024/V58/I8/1671

图 1  交叉口的示意图
图 2  交叉口相位配置的示意图
图 3  动作空间的示意图
RwRpWchange_ratePchange_rate
4(0.35, +∞)
3(0.25, 0.35]
2(0.15, 0.25]
1[0, 0.15]
?1[?0.15, 0)
?2[?0.25, ?0.15)
?3[?0.35, ?0.25)
?4(?∞, ?0.35)
表 1  奖励分段规则
图 4  车辆权重增益网络与A2C网络信息交互的示意图
图 5  交叉口级车辆权重增益网络的示意图
图 6  车道级车辆权重增益网络的示意图
算法1 XVWG-A3C算法
1)设定全局共享参数向量$ \boldsymbol{\theta } $$ {\boldsymbol{\theta }}_{v} $,全局共享计数器T. 假设线程特定的参数为$ {\boldsymbol{\theta }}' $$ {\boldsymbol{\theta }}_{v}' $. 2)初始化线程步骤计数器t←13)训练A3C网络和权重增益网络XVWG:1 repeat2 重置梯度$ {\mathrm{d}}\boldsymbol{\theta } $←0和d$ {\boldsymbol{\theta }}_{v} $0,重置线程参数$ {\boldsymbol{\theta }}' $=$ \boldsymbol{\theta } $, $ {\boldsymbol{\theta }}_{v}' $= $ {\boldsymbol{\theta }}_{v} $tstart = t;3 初始化神经网络的输入s,根据式(6)、(7)计算经过权重网络之后的状态st4 repeat5 根据策略函数$ \pi ({a_t}|{s_t};{{\boldsymbol{\theta}} '}) $执行动作at,得到奖励rt和经过权重网络之和的状态st+1;6 更新tt+1,TT+1;7 until 得到最终状态st 或者t?tstart==tmax7 若st+1为最终状态,R=0. 否则$ R = V({s_{t+1}},{\boldsymbol{\theta}} _v^{'}) $8 for $ i \in \{ t - 1, \cdots ,{t_{{\mathrm{start}}}}\} $do9 $ R \leftarrow {r_i}+\gamma R $10 计算累计梯度$ {{\boldsymbol{\theta}} ^{'}} \leftarrow {\mathrm{d}}{\boldsymbol{\theta}} + {\nabla _{{{\boldsymbol{\theta}} ^{'}}}}\ln \pi ({a_i}|{s_i};{{\boldsymbol{\theta}} ^{'}})(R - V({s_i};{\boldsymbol{\theta}} _v^{'})) $; $ {\boldsymbol{\theta}} _v^{'} \leftarrow {\mathrm{d}}{{\boldsymbol{\theta}} _v}+\partial {(R - V({s_i};{\boldsymbol{\theta}} _v^{'}))^2}/\partial {\boldsymbol{\theta}} _v^{'} $11 end for12 使用$ {\mathrm{d}}\boldsymbol{\theta } $和d$ {\boldsymbol{\theta }}_{v} $异步更新$ \boldsymbol{\theta } $$ {\boldsymbol{\theta }}_{v} $,使用式(18)~(22)更新车辆权重增益网络的参数Ψ;
13 until T > Tmax
  
图 7  融合车辆权重增益网络的强化学习A3C交通信号控制框架
实验对比参数数值
融合权重增益网络的A3C
算法、传统A3C算法
Actor网络学习率0.000 02
Critic网络学习率0.000 2
Actor网络神经元数量200
Critic网络神经元数量100
折扣因子0.9
训练步数200
训练时间/s7 200
DQN算法学习率0.000 02
神经元数量200
折扣因子0.9
训练步数200
训练时间/s7 200
表 2  对比实验中各深度强化学习模型的参数设置
参数数值
车道长度/m100
平均车辆长度/m5
最小车辆间隔/m2.5
车辆最大速度/(m·s?1)13.89
车辆最大加速度/(m·s?2)2.6
车辆最大减速度/(m·s?2)4.6
黄灯时间/s3
相位保持时绿灯持续时间/s5
相位最小绿灯时间/s15
车辆直行概率0.5
车辆左转概率0.3
车辆右转概率0.2
表 3  交通仿真环境的参数设置
图 8  回合累计奖励
图 9  平均车辆等待时间
图 10  平均车辆排队长度
图 11  平均车辆停车次数
控制方法低流量中流量高流量
W/sL/mPW/sL/mPW/sL/mP
固定配时13.855.830.7715.686.280.9117.318.621.01
自适应5.403.730.557.854.000.5713.344.610.63
DQN6.143.540.47`7.383.810.558.324.050.59
A3C5.883.430.466.903.730.497.453.830.56
LVWG-A3C5.133.280.436.113.520.467.223.760.52
IVWG-A3C4.723.180.404.923.260.436.313.500.47
表 4  单路口情况下不同交通信号控制方法的测试结果
1 YE B-L, WU W, RUAN K, et al A survey of model predictive control methods for traffic signal control[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6 (3): 623- 640
doi: 10.1109/JAS.2019.1911471
2 彭渠栩 优化双向“绿波带”关键路口控制参数算法的研究[J]. 应用数学进展, 2023, 12 (2): 781
PENG Quxu Research on the algorithm for optimizing the key intersection control parameters of two-way “Green Wave Belt”[J]. Advances in Applied Mathematics, 2023, 12 (2): 781
3 刘建伟, 高峰, 罗雄麟 基于值函数和策略梯度的深度强化学习综述[J]. 计算机学报, 2019, 42 (6): 1406- 1438
LIU Jianwei, GAO Feng, LUO Xionglin Review of deep reinforcement learning based on value functions and policy gradients[J]. Chinese Journal of Computers, 2019, 42 (6): 1406- 1438
doi: 10.11897/SP.J.1016.2019.01406
4 刘义, 何均宏 强化学习在城市交通信号灯控制方法中的应用[J]. 科技导报, 2019, 37 (6): 84- 90
LIU Yi, HE Junhong application of reinforcement learning in city traffic signal control methods.[J]. Science and Technology Review, 2019, 37 (6): 84- 90
5 FARID A, HUSSAIN F, KHAN K, et al A fast and accurate real-time vehicle detection method using deep learning for unconstrained environments[J]. Applied Sciences, 2023, 13 (5): 3059
doi: 10.3390/app13053059
6 MNIH V, KAVUKCUOGLU K, SILVER D, et al Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533
doi: 10.1038/nature14236
7 WANG M, WU L, LI J, et al Traffic signal control with reinforcement learning based on region-aware cooperative strategy[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23 (7): 6774- 6785
8 WANG Z, YANG K, LI L, et al Traffic signal priority control based on shared experience multi-agent deep reinforcement learning[J]. IET Intelligent Transport Systems, 2023, 17 (7): 1363- 1379
9 MA D, ZHOU B, SONG X, et al A deep reinforcement learning approach to traffic signal control with temporal traffic pattern mining[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23 (8): 11789- 11800
10 BOUKTIF S, CHENIKI A, OUNI A Traffic signal control using hybrid action space deep reinforcement learning[J]. Sensors, 2021, 21 (7): 2302
doi: 10.3390/s21072302
11 CHU T, WANG J, CODECA L, et al Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095
12 刘智敏, 叶宝林, 朱耀东, 等 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报: 工学版, 2022, 56 (6): 1249- 1256
LIU Zhimin, YE Baolin, ZHU Yaodong, et al Traffic signal control methods based on deep reinforcement learning[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (6): 1249- 1256
13 WU T, ZHOU P, LIU K, et al Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69 (8): 8243- 8256
doi: 10.1109/TVT.2020.2997896
14 赵乾, 张灵, 赵刚, 等 双环相位结构约束下的强化学习交通信号控制方法[J]. 交通运输工程与信息学报, 2023, 21 (1): 19- 28
ZHAO Qian, ZHANG Ling, ZHAO Gang, et al Reinforcement learning traffic signal control method under dual-ring phase structure constraints[J]. Journal of Transportation Engineering and Information, 2023, 21 (1): 19- 28
15 王安麟, 孙晓龙, 钟馥声 一种基于通行优先度规则的城市交通信号自组织控制方法[J]. 重庆交通大学学报: 自然科学版, 2018, 37 (2): 96
WANG Anlin, SUN Xiaolong, ZHONG Fusheng A self-organized control method for urban traffic signals based on priority rules for passage[J]. Journal of Chongqing Jiaotong University: Natural Science, 2018, 37 (2): 96
16 ASIAIN E, CLEMPNER J B, POZNYAK A S Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies[J]. Soft Computing, 2019, 23 (11): 3591- 3604
doi: 10.1007/s00500-018-3225-7
17 TROIA S, SAPIENZA F, VARÉ L, et al On deep reinforcement learning for traffic engineering in SD-WAN[J]. IEEE Journal on Selected Areas in Communications, 2021, 39 (7): 2198- 2212
doi: 10.1109/JSAC.2020.3041385
18 TAN K L, SHARMA A, SARKAR S Robust deep reinforcement learning for traffic signal control[J]. Journal of Big Data Analytics in Transportation, 2020, 2: 263- 274
doi: 10.1007/s42421-020-00029-6
19 LI M, LI Z, XU C, et al Deep reinforcement learning-based vehicle driving strategy to reduce crash risks in traffic oscillations[J]. Transportation Research Record, 2020, 2674 (10): 42- 54
doi: 10.1177/0361198120937976
20 CHU T, WANG J, CODECÀ L, et al Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21 (3): 1086- 1095
doi: 10.1109/TITS.2019.2901791
[1] 张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[2] 姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[3] 华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[4] 刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[5] 邓齐林,鲁娟,陈勇辉,冯健,廖小平,马俊燕. 基于深度强化学习的数控铣削加工参数优化方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2145-2155.
[6] 马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.
[7] 卢凯,田鑫,林观荣,邓兴栋. 交叉口信号相位设置与配时同步优化模型[J]. 浙江大学学报(工学版), 2020, 54(5): 921-930.