We introduce Reliable Policy Iteration (RPI) and Conservative RPI (CRPI), variants of Policy Iteration (PI) and Conservative PI (CPI), that retain tabular guarantees under function approximation. RPI uses a novel Bellman-constrained optimization for policy evaluation. We show that RPI restores the textbook monotonicity of value estimates and that these estimates provably lower-bound the true return; moreover, their limit partially satisfies the unprojected Bellman equation. CRPI shares RPI’s evaluation, but updates policies conservatively by maximizing a new performance-difference lower bound that explicitly accounts for function-approximation-induced errors. CRPI inherits RPI’s guarantees and, crucially, admits per-step improvement bounds. In initial simulations, RPI and CRPI outperform PI and its variants.