Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Publication
Advances in Neural Information Processing Systems (NeurIPS)