两方零和马尔科夫博弈策略梯度算法及收敛性分析 |
王卓,李永强,冯宇,冯远静 |
Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games |
Zhuo WANG,Yongqiang LI,Yu FENG,Yuanjing FENG |
图 3 策略参数化下MG-PG算法的纳什收敛指标 |
Fig.3 Nash convergence of MG-PG algorithm with policy parameterized |
![]() |