DiffusionRenderer: Neural Inverse and Forward Rendering
with Video Diffusion Models

Ruofan Liang^1,2,3 * Zan Gojcic¹ Huan Ling^1,2,3 Jacob Munkberg¹ Jon Hasselgren¹ Zhi-Hao Lin^{1, 4}
Jun Gao¹ Alexander Keller¹ Nandita Vijaykumar^2,3 Sanja Fidler^1,2,3 Zian Wang^1,2,3 *

¹NVIDIA ²University of Toronto ³Vector Institute ⁴University of Illinois Urbana-Champaign * equal contribution

CVPR 2025, Oral

article Paper play_circle Supp Video code Code (SVD) cloud_download Model (SVD) play_circle Demo (SVD) format_quote BibTeX

code Code (Cosmos) cloud_download Model (Cosmos) play_circle Demo (Cosmos)

🔥 New Release

📢 [June 12, 2025] Open-sourced Cosmos DiffusionRenderer, a major upgrade that brings significantly improved quality powered by NVIDIA Cosmos and enhanced data curation. Try it out at the GitHub repo!

📢 [June 11, 2025] Released our video demo and blog on Cosmos Diffusion Renderer.

📢 [June 11, 2025] Released the code and model weights for the academic version of DiffusionRenderer. This version reproduces the results in our paper. Explore the GitHub repo and model weights!

DiffusionRenderer is a general-purpose method for both neural inverse and forward rendering. From input images or videos, it accurately estimates geometry and material buffers, and generates photorealistic images under specified lighting conditions, offering fundamental tools for image editing applications.

Abstract

Understanding and modeling lighting effects are fundamental tasks in computer vision and graphics. Classic physically-based rendering (PBR) accurately simulates the light transport, but relies on precise scene representations—explicit 3D geometry, high-quality material properties, and lighting conditions—that are often impractical to obtain in real-world scenarios. Therefore, we introduce DiffusionRenderer, a neural approach that addresses the dual problem of inverse and forward rendering within a holistic framework. Leveraging powerful video diffusion model priors, the inverse rendering model accurately estimates G-buffers from real-world videos, providing an interface for image editing tasks, and training data for the rendering model. Conversely, our rendering model generates photorealistic images from G-buffers without explicit light transport simulation. Experiments demonstrate that DiffusionRenderer effectively approximates inverse and forwards rendering, consistently outperforming the state-of-the-art. Our model enables practical applications from a single video input—including relighting, material editing, and realistic object insertion.

Method overview. Given an input video, the neural inverse renderer estimates geometry and material properties per pixel. It generates one scene attribute at a time, with the domain embedding indicating the target attributes to generate. Conversely, the neural forward renderer produces photorealistic images given lighting information, geometry, and material buffers. The lighting condition is injected into the base video diffusion model through cross-attention layers. During joint training with both synthetic and real data, we use an optimizable LoRA for real data sources.

Motivation

DiffusionRenderer is motivated to consider the inverse and forward rendering problems jointly, overcoming limitations of classic physically-based rendering (PBR) methods.

Classic PBR relies on explicit 3D geometry such as meshes. When it is not available, screen space ray tracing (SSRT) struggles to accurately represent shadows and reflections. Our forward renderer synthesizes photorealistic lighting effects without explicit path tracing and 3D geometry.

PBR is also sensitive to errors in G-buffers. SSRT with estimated G-buffers from state-of-the-art inverse rendering models often fails to deliver quality results. Our forward renderer is trained to tolerate noisy buffers.

Forward Rendering

Video generation from G-buffers. The forward renderer generates accurate shadows and reflections that are consistent across viewpoints. Notably, these lighting effects are synthesized entirely from an environment map, despite the input G-buffers containing no explicit shadow or reflection information.

Gbuffer&Env

SplitSum

SSRT

DiLightNet

RGB↔X

Ours

Reference
Gbuffer&Env

SplitSum

SSRT

DiLightNet

RGB↔X

Ours

Reference
Gbuffer&Env

SplitSum

SSRT

DiLightNet

RGB↔X

Ours

Reference

Qualitative comparison of forward rendering. Our method generates high-quality inter-reflections and shadows, producing more accurate results than the neural baselines.

Inverse Rendering

The inverse renderer offers a general-purpose solution for de-lighting, producing accurate and temporally consistent scene attributes such as normals, albedo, roughness, and metallicity.

Relighting

We demonstrate the effectiveness of our combined inverse and forward rendering model in the relighting task. We use the estimated G-buffers from the inverse renderer to relight the scene with different lighting conditions.

portrait Obj 1 portrait Obj 2 portrait Obj 3 portrait Obj 4 portrait Obj 5 portrait Obj 6
landscape Scene 1 landscape Scene 2 landscape Scene 3 landscape Scene 4 landscape Scene 5

Source

Target Env. Map

DiLightNet

Neural Gaffer

Ours

Ground Truth

panorama_horizontal Env. Map 1 panorama_horizontal Env. Map 2 panorama_horizontal Env. Map 3

Applications

Relighting Real Scenes

Comparison with reconstruction-based inverse rendering and relighting methods
Comparison with reconstruction-based inverse rendering and relighting methods

Material Editing

Edited Metallic

Edited Roughness

Our Rendering

Ground Truth

Object Insertion

🖱️ Hover over each image to see the background without the virtual object.

Paper

DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Ruofan Liang*, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang*

description arXiv

description Paper

description Supp Video

description Demo Video

BibTeX

@inproceedings{DiffusionRenderer,
    author = {Ruofan Liang and Zan Gojcic and Huan Ling and Jacob Munkberg and 
        Jon Hasselgren and Zhi-Hao Lin and Jun Gao and Alexander Keller and 
        Nandita Vijaykumar and Sanja Fidler and Zian Wang},
    title = {DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2025}
}

Acknowledgment

The authors thank Shiqiu Liu, Yichen Sheng, and Michael Kass for their insightful discussions that contributed to this project. We also appreciate the discussions with Xuanchi Ren, Tianchang Shen and Zheng Zeng during the model development process.

DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models