Reinforcement Learning

Monotone and Conservative Policy Iteration Beyond the Tabular Case

We introduce Reliable Policy Iteration (RPI) and Conservative RPI (CRPI), variants of Policy Iteration (PI) and Conservative PI (CPI), that retain tabular guarantees under function approximation. RPI uses a novel Bellman-constrained optimization for …

Non-rectangular Robust MDPs with Normed Uncertainty Sets

On the Convergence of Single-Timescale Actor-Critic

Policy Optimized Text-to-Image Pipeline Design

State Entropy Regularization for Robust Reinforcement Learning

RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression

Video encoders optimize compression for human perception by minimizing reconstruction error under bit-rate constraints. In many modern applications such as autonomous driving, an overwhelming majority of videos serve as input for AI systems …

Gradient Boosting Reinforcement Learning

Policy Gradient via Tree Expansion

Real-Time Rate Control for Task-Aware Video Compression Using Reinforcement Learning

Global Convergence of Policy Gradient in Average Reward MDPs