Cosmos World Foundation Model Platform for Physical AI

Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a
digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model
Platform to help developers build customized world models for their Physical AI setups. We position
a world foundation model as a general-purpose world model that can be fine-tuned into customized
world models for downstream applications. Our platform covers a video curation pipeline, pre-trained

Julius Berner

Julius Berner is a research scientist in NVIDIA’s Fundamental Generative AI Research (GenAIR) Group. He did his postdoc at Caltech and received his PhD from the University of Vienna in 2023. His research focuses on (probabilistic) machine learning with applications in the natural sciences, including generative modeling, sampling, and neural solvers for partial differential equations and inverse problems. More information can be found on his personal website.

Automatic Tracing in Task-Based Runtime Systems

Implicitly parallel task-based runtime systems often perform dynamic analysis to discover dependencies in and extract parallelism from sequential programs. Dependence analysis becomes expensive as task granularity drops below a threshold. Tracing techniques have been developed where programmers annotate repeated program fragments (traces) issued by the application, and the runtime system memoizes the dependence analysis for those fragments, greatly reducing overhead when the fragments are executed again.

Composing Distributed Computations Through Task and Kernel Fusion

We introduce Diffuse, a system that dynamically performs task and kernel fusion in distributed, task-based runtime systems. The key component of Diffuse is an intermediate representation of distributed computation that enables the necessary analyses for the fusion of distributed tasks to be performed in a scalable manner. We pair task fusion with a JIT compiler to fuse together the kernels within fused tasks.

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows

We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations of 3D anchor points that are defined on the surface of the scene. In order to identify anchor points, we introduce a novel correspondence algorithm that first matches RGB-based pairs, then leverages multi-view information and 3D reprojection to robustly filter false positives in two steps.

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associate the two states.