Detecting other agents and forecasting their behavior is an integral part of the modern robotic autonomy stack, especially in safety-critical scenarios entailing human-robot interaction such as autonomous driving. Due to the importance of these components, there has been a significant amount of interest and research in perception and trajectory forecasting, resulting in a wide variety of approaches. Common to most works, however, is the use of the same few accuracy-based evaluation metrics, e.g., intersection-over-union, displacement error, log-likelihood, etc. While these metrics are informative, they are task-agnostic and outputs that are evaluated as equal can lead to vastly different outcomes in downstream planning and decision making. In this work, we take a step back and critically assess current evaluation metrics, proposing task-aware metrics as a better measure of performance in systems where they are deployed. Experiments on an illustrative simulation as well as real-world autonomous driving data validate that our proposed task-aware metrics are able to account for outcome asymmetry and provide a better estimate of a model’s closed-loop performance.