CVPR 2026 Workshop
In this workshop, we will explore graphics-in-the-loop, physically grounded 4D reconstruction for physical AI digital twins. We aim to encourage discourse in the research community to tackle the real-to-sim-to-real challenge by bridging graphics, vision, and physical modeling.
This workshop will address three key challenges:
It aims to foster the exchange of ideas between researchers in varied communities, across vision, graphics, and robotics, to solve interdisciplinary challenges towards embodied physical AI.
Physical AI represents the next frontier of artificial intelligence: embodied agents that learn by interacting with the physical world. Achieving the scale and diversity of data required to train physical AI foundation models, however, requires implausible amounts of photorealistic, physically accurate interaction data.
Advances in 4D neural representations, such as neural radiance fields and Gaussian splatting, enable unprecedented visual realism, but lack physical grounding for sim-ready usage. While video diffusion models exhibit visually impressive results, they fundamentally lack physical understanding, exhibiting what researchers term "case-based" generalization rather than true physical dynamics. More critically, they do not provide the rich multimodal feedback (e.g., tactile feedback and collision physics) required for physically grounded simulators for embodied AI.
Physics-based simulation through traditional computer graphics offers a practical alternative, yet persistent sim-to-real gaps limit its effectiveness. Combining traditional computer graphics with 4D neural representations offers new possibilities for interactive, physically grounded environments for embodied AI.
Computer graphics brings decades of expertise in modeling physics through rigorous frameworks (e.g., rigid body dynamics and collision detection, soft body deformations, fluid simulations), contributing physical fidelity that purely data-driven models struggle to replicate. These graphics simulators naturally support multimodal sensory feedback, such as appearance changes, deformations, contact forces, and tactile responses, critical for manipulation and embodied learning.
Moreover, the explicit geometric and physical representations in graphics enables controlled, agent-in-the-loop experimentation essential for reinforcement learning. By grounding neural 4D representations with graphics-inspired physics, we can join photorealistic rendering capabilities of neural methods with real-world physical consistency, narrowing the sim-to-real gap and enabling more robust policy transfer.
We will be hosting invited speakers and will also be inviting poster presentations from accepted papers.