两方零和马尔科夫博弈策略梯度算法及收敛性分析
王卓,李永强,冯宇,冯远静

Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games
Zhuo WANG,Yongqiang LI,Yu FENG,Yuanjing FENG
表 1 不同参数化策略设定下的算法超参数
Tab.1 Algorithm hyperparameters under different parameterized policy settings
参数$ \lambda $$ {\alpha _k} $$\gamma $$ {k_{\max }} $$n$
表格式Softmax0.30.51.01061
神经网络0.10.31.08×1061