Theory

On the Convergence of Single-Timescale Actor-Critic

Global Convergence of Policy Gradient in Average Reward MDPs