Reinforcement Learning

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: *distribution shift* and *scalability*. We first discover and analyze a …

Known unknowns: Learning novel concepts using exploratory reasoning-by-elimination

Video Abstract Cite the paper If you use the contents of this project, please cite our paper. @article{hagrawal2021unknown, title={Known unknowns: Learning novel concepts using exploratory reasoning-by-elimination}, author={Harsh Agrawal, Eli Meirom, Yuval Atzmon, Shie Mannor, Gal Chechik}, journal={Uncertainty in artificial intelligence}, year={2021} }

Acting in Delayed Environments with Non-Stationary Markov Policies

The standard Markov Decision Process (MDP) formulation hinges on the assumption that an action is executed immediately after it was chosen. However, assuming it is often unrealistic and can lead to catastrophic failures in applications such as …