Home
News
Members
Projects
Publications
Contact
Light
Dark
Automatic
Variance Reduction. Tree Search
SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search
Despite the popularity of policy gradient methods, they are known to suffer from large variance and high sample complexity. To mitigate this, we introduce SoftTreeMax – a generalization of softmax that takes planning into account.
Cite
×