Robust and Controllable Object-Centric Learning through Energy-based Models

Humans are remarkably good at understanding and reasoning about complex visual scenes. The capability to decompose low-level observations into discrete objects allows us to build a grounded abstract representation and identify the compositional structure of the world. Accordingly, it is a crucial step for machine learning models to be capable of inferring objects and their properties from visual scenes without explicit supervision.

Sample-Efficient Safety Assurances using Conformal Prediction

When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e. of the situations that are unsafe, fewer than ϵ will occur without an alert.

Semantic Anomaly Detection with Large Language Models

As robots acquire increasingly sophisticated skills and see increasingly complex and varied environments, the threat of an edge case or anomalous failure is ever present. For example, Tesla cars have seen interesting failure modes ranging from autopilot disengagements due to inactive traffic lights carried by trucks to phantom braking caused by images of stop signs on roadside billboards. These system-level failures are not due to failures of any individual component of the autonomy stack but rather system-level deficiencies in semantic reasoning.

Learning for CasADi: Data-driven Models in Numerical Optimization

While real-world problems are often challenging to analyze analytically, deep learning excels in modeling complex processes from data. Existing optimization frameworks like CasADi facilitate seamless usage of solvers but face challenges when integrating learned process models into numerical optimizations. To address this gap, we present the Learning for CasADi (L4CasADi) framework, enabling the seamless integration of PyTorch-learned models with CasADi for efficient and potentially hardware-accelerated numerical optimization.

Pushing the Limits? Frame Rate Benefits to Players for up to 500 Hz in First Person Shooter Games

Computer games -- and computer game players -- often drive technology improvements, with graphics cards and monitors pushing the limits of display technologies. High frame rates, in particular, promise to provide lower latencies and smoother game visuals to gamers, especially important for competitive first person shooter (FPS) game players. What is not well-known is to what extent gamers benefit from ultra-high frame rates in terms of player performance and quality of experience. This paper studies the effects of frame rates -- especially high frame rates -- on FPS game players.

Toward Understanding Display Size for FPS Esports Aiming

Gamers use a variety of different display sizes, though for PC gaming, monitors in the 24 to 27 inch size range have become most popular. Particularly popular among many PC gamers, first person shooter (FPS) games represent a genre where hand-eye coordination is particularly central to the player's performance in game. In a carefully designed set of experiments on FPS aiming, we compare player performance across a range of display sizes.

NVIDIA Isaac GR00T N1: An Open Foundation Model for Humanoid Robots

At NVIDIA, we are developing AI solutions to enable general-purpose humanoid robots to understand the human world, follow language instructions, and perform diverse tasks. A robust Vision-Language-Action (VLA) model is crucial for such advanced capabilities. To this end, we developed GR00T N1, a generalist robot model trained on a diverse dataset that includes egocentric human videos, real and simulated robot trajectories, and synthetic data. 

Cosmos-Reason 1: From Physical AI Common Sense to Embodied Decisions

Physical AI systems need to perceive, understand, and perform complex actions in the physical world. In this paper, we present the Cosmos-Reason1 models that can understand the physical world and generate appropriate embodied decisions (e.g., next step action) in natural language through long chain-of-thought reasoning processes. We begin by defining key capabilities for Physical AI reasoning, with a focus on physical common sense and embodied reasoning.

Real-Time Anomaly Detection and Reactive Planning with Large Language Models

Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot generalization capabilities that make them a promising technology towards detecting and mitigating out-of-distribution failure modes of robotic systems. Fully realizing this promise, however, poses two challenges: (i) mitigating the considerable computational expense of these models such that they may be applied online, and (ii) incorporating their judgement regarding potential anomalies into a safe control framework.

MTP: Multi-Hypothesis Tracking and Prediction for Reduced Error Propagation

Recently, there has been tremendous progress in developing each individual module of the standard perception-planning robot autonomy pipeline, including detection, tracking, prediction of other agents' trajectories, and ego-agent trajectory planning. Nevertheless, there has been less attention given to the principled integration of these components, particularly in terms of the characterization and mitigation of cascading errors. This paper addresses the problem of cascading errors by focusing on the coupling between the tracking and prediction modules.