1. [Publications](/publications)
2. Planning and Learning with Adaptive Lookahead
 
 # Planning and Learning with Adaptive Lookahead

  ![Publication image](/sites/default/files/styles/wide/public/default_images/default.jpeg?itok=qUFsuJCP "Publication image")

 The classical Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation. Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration. However, prior to running the algorithm, one cannot tell what is the best fixed lookahead horizon. Moreover, per a given run, using a lookahead of horizon larger than one is often wasteful. In this work, we propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate. We devise two PI variants and analyze the trade-off between iteration count and computational complexity per iteration. The first variant takes the desired contraction factor as the objective and minimizes the per-iteration complexity. The second variant takes as input the computational complexity per iteration and minimizes the overall contraction factor. We then devise a corresponding DQN-based algorithm with an adaptive tree search horizon. We also include a novel enhancement for on-policy learning: per-depth value function estimator. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and in Atari


 ## Authors


Aviv Rosenberg (Tel-Aviv University, NVIDIA)

[Assaf Hallak](/person/assaf-hallak)

[Shie Mannor](/person/shie-mannor)

[Gal Chechik](/person/gal-chechik)

[Gal Dalal](/person/gal-dalal)

 
 ## Publication Date


Thursday, January 28, 2021

 
 ## Published in


[Arxiv](https://arxiv.org/abs/2201.12403)

 
 ## Research Area


[Artificial Intelligence and Machine Learning ](/research-area/machine-learning-artificial-intelligence)