基于异步优势演员-评论家的交通信号控制方法
叶宝林,孙瑞涛,吴维敏,陈滨,姚青
Traffic signal control method based on asynchronous advantage actor-critic
Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO
表 1
奖励分段规则
Tab.1
Reward setting rules
R
w
(
R
p
)
W
change_rate
(
P
change_rate
)
4
(0.35, +∞)
3
(0.25, 0.35]
2
(0.15, 0.25]
1
[0, 0.15]
−1
[−0.15, 0)
−2
[−0.25, −0.15)
−3
[−0.35, −0.25)
−4
(−∞, −0.35)