SimFoundry: Modular and Automated Scene Generation
for Policy Learning and Evaluation

Nadun Ranawaka1,2*, Josiah Wong1,3*, Wei-Lin Pai3, Wei-Teng Chu3, Tianyuan Dai1,4, Masoud Moghani1,5, Hang Yin3, Yunfan Jiang1,3, Wesley Durbano1*, Brandon Huynh1*, Yu Fang1, Linxi Fan1, Danfei Xu1,2, Ruohan Zhang3, Li Fei-Fei3, Bowen Wen1, Ajay Mandlekar1†, Yuke Zhu1,4†
1NVIDIA   2Georgia Institute of Technology   3Stanford University   4The University of Texas at Austin   5University of Toronto
*Equal contribution.   Equal advising.
Under Review

TL;DR SimFoundry takes a single video of a real world scene and automatically turns it into an interactive simulation environment for scalable policy training and evaluation.

SimFoundry Reconstructed Scenes

Explore the reconstructed 3D scenes interactively. Drag to rotate, scroll to zoom.

Hybrid scene: 3D Gaussian Splat background + textured object meshes
Loading viewer…
Drag and scroll to rotate and zoom.

Note: 3D viewers are not supported on mobile. Please visit on desktop for the interactive experience.

SimFoundry scenes can be used to evaluate real-world policies in simulation, with a mean Pearson correlation of 0.911.

Real-to-Sim Eval

Side-by-side comparison of policy execution in real world versus SimFoundry.

Policy: DreamZero

Real World
Success Rate: 16%
SimFoundry
Success Rate: 28%

Sim vs. Real Success Rates

SimFoundry predictions closely track real-world performance and outperform a State-of-the-Art baseline (PolaRiS, Jain et al. 2025)

(hover over the points to see additional metrics)

SimFoundry
PolaRiS
Tasks

Evaluation Metrics Comparison

Higher Pearson r and lower MMRV indicate better sim-to-real correlation

Average Pearson r
SimFoundry
PolaRiS
Average MMRV
SimFoundry
PolaRiS

Policies Trained on SimFoundry Data Transfer Zero-Shot to Real-World.

Sim-to-Real Policy Training

Simulation Eval
Success rate: 100%
Real World Eval
Success rate: 96%
Real World Eval (Unseen Objects)
Success rate: 100%
Success rate: 92%
Success rate: 43%
Success rate: 42%

Co-Training SimFoundry Sim Data and Small Amounts of Real Data Produces Even Better Policies

Evaluation results on the DROID platform in the real world

Sim-only Real-only Co-train
Success rate (%)
100806040200
88
100
100
24
60
92
20
56
92
Stack DishwareStore MarkerPut Away Trash

SimFoundry Digital Cousins Enable Policy Generalization to Novel Objects and Tasks

Digital Twin and Cousin Generation

Selected results across diverse scenes and tasks.

Input video (2x)
Reconstructed twins
Reconstructed cousins

DROID - Evaluation on objects that are unseen in both simulation and real world

Digital twin and cousins of the wooden drawer organizer

SimFoundry generates meaningful Task Cousins that facilitate policy generalization

SimFoundry data boosts real-world VLA performance

Across 13 tasks
28% 46%
w/o SimFoundry
w/ SimFoundry
results for finetuning π0.5-DROID
On 7 held-out tasks (unseen even in sim)
0% 29%
w/o SimFoundry
w/ SimFoundry
results for finetuning π0.5-base
Sim Eval
Real Eval

SimFoundry outperforms State-of-the-Art methods on scene reconstruction accuracy

Reconstructed Objects

Click an object in the real scene to view its reconstructed, physics-ready mesh.

Real Scene — hover and click an object
Real scene
Loading scene…
Reconstructed Mesh
Click an object to load its mesh
Drag and scroll to rotate and zoom.

Reconstructed Scene v.s. SAM3D output

SimFoundry recovers more accurate object meshes and poses than SAM3D, especially on occluded, cluttered scenes. The pipeline also generalizes across input modalities, supporting open-source datasets and synthetically generated images out of the box.

Input Image
Input image
Image not configured
SimFoundry Output
Loading viewer…
Drag and scroll to rotate and zoom.
SAM3D Output
Loading viewer…
Drag and scroll to rotate and zoom.
Easy Scene
0.0081
0.0042
0.0026
0.0160
0.0060
0.0041
Chamfer Distance (m) ↓Position Error (m) ↓
Medium Scene
0.0087
0.0047
0.0033
0.0180
0.0076
0.0057
Chamfer Distance (m) ↓Position Error (m) ↓
Hard Scene
0.0088
0.0091
0.0039
0.0220
0.0180
0.0073
Chamfer Distance (m) ↓Position Error (m) ↓
SAM3D Zero-Shot SimFoundry Zero-Shot SimFoundry Tuned

Note: 3D viewers are not supported on mobile. Please visit on desktop for the interactive experience.

Automatic Background Reconstruction

SimFoundry erases foreground objects to produce a background-only video, which is refined and used to train a 3D Gaussian Splat for high-fidelity background reconstruction.

Real-World Video
Background Only
SimFoundry Reconstruction

SimFoundry Pipeline

SimFoundry extracts per-object relevant information (segmentation masks, depth, etc.), generates 3D visual meshes via 2D-to-3D generation models, and compiles the final output scene by annotating relevant physical parameters and sanity checking the overall scene configuration in a physics simulator. SimFoundry additionally supports diverse simulated augmentations of objects, scenes, and tasks. SimFoundry's modular design ensures that as individual foundation models improve, the pipeline improves with them—requiring no redesign, only a component swap.

SimFoundry pipeline: Physical Scene Extraction, Sim Scene Generation, and Cousin Augmentation