The safety-critical nature of autonomous vehicle (AV) operation necessitates development of task-relevant algorithms that can reason about safety at the system level and not just at the component level. To reason about the impact of a perception failure on the entire system performance, such task-relevant algorithms must contend with various challenges: complexity of AV stacks, high uncertainty in the operating environments, and the need for real-time performance. To overcome these challenges, in this work, we introduce a Q-network called SPARQ (abbreviation for Safety evaluation for Perception And Recovery Q-network) that evaluates the safety of a plan generated by a planning algorithm, accounting for perception failures that the planning process may have overlooked. This Q-network can be queried during system runtime to assess whether a proposed plan is safe for execution or poses potential safety risks. If a plan is deemed unsafe, SPARQ can proactively trigger a recovery action to prevent safety violations, e.g., triggering AV to execute a fallback safe policy. We validate SPARQ’s ability to improve safety compared to baselines across two simulators including closed-loop settings involving complex multi-agent interactions: (i) CARLA, an urban autonomous driving simulator, and (ii) NVIDIA Isaac Sim, a photo-realistic simulator for autonomous systems. We demonstrate that integrating SPARQ with a planner improves safety by 5x while only incurring a 10% reduction in planner’s performance, providing a favorable trade-off between safety and performance. We further illustrate generalization capabilities of SPARQ to real-world scenarios by showing that SPARQ trained entirely in simulation achieves a 10% increase in safety when deployed on real-world data from nuPlan-Vegas dataset. SPARQ represents a step towards developing task-relevant safety algorithms that can unlock the full potential of AVs.