World from Motion: Generative Dynamic Gaussian Reconstruction from Monocular Video

Liyuan Zhu^1,2 Shengyu Huang² Amrita Mazumdar² Tianye Li² Zan Gojcic² Gordon Wetzstein¹ Iro Armeni¹ Shalini De Mello² Alex Trevithick²

¹Stanford University ²NVIDIA

Paper Interactive Viewer Pipeline

Scroll

Abstract

Generative 4D reconstruction from monocular video

World from Motion improves dynamic 3D Gaussian reconstructions by using a video generator as a controllable prior. We condition generation on a persistent 4D representation, sample new dynamic viewpoints, and distill the generated observations back into the reconstruction.

Initial

WfM

0:00 / 0:00

Side-by-side overlay of the initial reconstruction and the WfM result.

Interactive Viewer

Explore the dynamic Gaussian reconstructions

Browser-based 4D Gaussian previews with scene switching and camera controls.

Results

Quantitative Results

Table 1

State-of-the-art 4D Reconstruction

4D Reconstruction Benchmark on DyCheck

Method	Covisible mPSNR ↑	Covisible mSSIM ↑	Covisible mLPIPS ↓
Shape-of-Motion	17.32	0.598	0.296
MoSca	19.32	0.706	0.264
WorldTree	19.75	0.728	0.240
ViDAR	19.69	0.713	0.223
World-from-Motion	20.26	0.732	0.215

Table 2

Conditioning on a persistent 4D representation produces the best camera control.

4D Novel-View Synthesis Benchmark on DyCheck

Method	mPSNR ↑	mSSIM ↑	mLPIPS ↓
ReCamMaster	10.96	0.262	0.755
GEN3C	12.06	0.260	0.679
TrajectoryCrafter	13.06	0.320	0.656
Vista4D	14.14	0.310	0.514
World-from-Motion	18.45	0.635	0.362

Acknowledgements

We thank Yang Zheng, Zhengfei Kuang, Lior Yariv, and Jianhao Zheng for fruitful discussions. We also thank Yijia Weng and Jiahui Lei for providing evaluation details for MoSca, Kuan Heng Lin for providing Vista4D evaluation details, and Michal Nazarczuk and Eduardo Pérez-Pellitero for providing evaluation details for ViDAR. This website builds on the templates from RealmDreamer and CAT4D.

BibTeX

@misc{zhu2026worldfrommotion,
  title = {World from Motion: Generative Dynamic Gaussian Reconstruction from Monocular Video},
  author = {Liyuan Zhu and Shengyu Huang and Amrita Mazumdar and Tianye Li and Zan Gojcic and Gordon Wetzstein and Iro Armeni and Shalini De Mello and Alex Trevithick},
  year = {2026}
}

World from Motion: Generative Dynamic Gaussian Reconstruction from Monocular Video

Generative 4D reconstruction from monocular video

Explore the dynamic Gaussian reconstructions

Quantitative Results

State-of-the-art 4D Reconstruction

Conditioning on a persistent 4D representation produces the best camera control.

The More Views We Sample, the Better Reconstruction We Get

mPSNR ↑

mSSIM ↑

mLPIPS ↓

WfM improves the 3D motion

Reconstruction Guidance

Acknowledgements

BibTeX