World from Motion: Generative Dynamic Gaussian Reconstruction from Monocular Video

1Stanford University 2NVIDIA
Scroll
Abstract

Generative 4D reconstruction from monocular video

World from Motion improves dynamic 3D Gaussian reconstructions by using a video generator as a controllable prior. We condition generation on a persistent 4D representation, sample new dynamic viewpoints, and distill the generated observations back into the reconstruction.

Initial
WfM
0:00 / 0:00
Side-by-side overlay of the initial reconstruction and the WfM result.
Interactive Viewer

Explore the dynamic Gaussian reconstructions

Browser-based 4D Gaussian previews with scene switching and camera controls.

Results

Quantitative Results

Table 1

State-of-the-art 4D Reconstruction

4D Reconstruction Benchmark on DyCheck

Method Covisible mPSNR ↑ Covisible mSSIM ↑ Covisible mLPIPS ↓
Shape-of-Motion17.320.5980.296
MoSca19.320.7060.264
WorldTree19.750.7280.240
ViDAR19.690.7130.223
World-from-Motion20.260.7320.215

Acknowledgements

We thank Yang Zheng, Zhengfei Kuang, Lior Yariv, and Jianhao Zheng for fruitful discussions. We also thank Yijia Weng and Jiahui Lei for providing evaluation details for MoSca, Kuan Heng Lin for providing Vista4D evaluation details, and Michal Nazarczuk and Eduardo Pérez-Pellitero for providing evaluation details for ViDAR. This website builds on the templates from RealmDreamer and CAT4D.

BibTeX

@misc{zhu2026worldfrommotion,
  title = {World from Motion: Generative Dynamic Gaussian Reconstruction from Monocular Video},
  author = {Liyuan Zhu and Shengyu Huang and Amrita Mazumdar and Tianye Li and Zan Gojcic and Gordon Wetzstein and Iro Armeni and Shalini De Mello and Alex Trevithick},
  year = {2026}
}