两方零和马尔科夫博弈策略梯度算法及收敛性分析
王卓,李永强,冯宇,冯远静

Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games
Zhuo WANG,Yongqiang LI,Yu FENG,Yuanjing FENG
图 2 Oshi-Zumo在不同设定下的状态变化示意图
Fig.2 Schematic diagrams of state of Oshi-Zumo in different settings