Toronto AI Lab NVIDIA Research
DiffusionRenderer

DiffusionRenderer: Neural Inverse and Forward Rendering
with Video Diffusion Models

Ruofan Liang1,2,3 *     Zan Gojcic1     Huan Ling1,2,3     Jacob Munkberg1     Jon Hasselgren1     Zhi-Hao Lin1, 4    
Jun Gao1     Alexander Keller1     Nandita Vijaykumar2,3     Sanja Fidler1,2,3     Zian Wang1,2,3 *    
1NVIDIA     2University of Toronto     3Vector Institute     4University of Illinois Urbana-Champaign     * equal contribution

DiffusionRenderer is a general-purpose method for both neural inverse and forward rendering. From input images or videos, it accurately estimates geometry and material buffers, and generates photorealistic images under specified lighting conditions, offering fundamental tools for image editing applications.

Abstract


Understanding and modeling lighting effects are fundamental tasks in computer vision and graphics. Classic physically-based rendering (PBR) accurately simulates the light transport, but relies on precise scene representations—explicit 3D geometry, high-quality material properties, and lighting conditions—that are often impractical to obtain in real-world scenarios. Therefore, we introduce DiffusionRenderer, a neural approach that addresses the dual problem of inverse and forward rendering within a holistic framework. Leveraging powerful video diffusion model priors, the inverse rendering model accurately estimates G-buffers from real-world videos, providing an interface for image editing tasks, and training data for the rendering model. Conversely, our rendering model generates photorealistic images from G-buffers without explicit light transport simulation. Experiments demonstrate that DiffusionRenderer effectively approximates inverse and forwards rendering, consistently outperforming the state-of-the-art. Our model enables practical applications from a single video input—including relighting, material editing, and realistic object insertion.


Method overview. Given an input video, the neural inverse renderer estimates geometry and material properties per pixel. It generates one scene attribute at a time, with the domain embedding indicating the target attributes to generate. Conversely, the neural forward renderer produces photorealistic images given lighting information, geometry, and material buffers. The lighting condition is injected into the base video diffusion model through cross-attention layers. During joint training with both synthetic and real data, we use an optimizable LoRA for real data sources.

Motivation


DiffusionRenderer is motivated to consider the inverse and forward rendering problems jointly, overcoming limitations of classic physically-based rendering (PBR) methods.

Classic PBR relies on explicit 3D geometry such as meshes. When it is not available, screen space ray tracing (SSRT) struggles to accurately represent shadows and reflections. Our forward renderer synthesizes photorealistic lighting effects without explicit path tracing and 3D geometry.

PBR is also sensitive to errors in G-buffers. SSRT with estimated G-buffers from state-of-the-art inverse rendering models often fails to deliver quality results. Our forward renderer is trained to tolerate noisy buffers.

Forward Rendering


Video generation from G-buffers. The forward renderer generates accurate shadows and reflections that are consistent across viewpoints. Notably, these lighting effects are synthesized entirely from an environment map, despite the input G-buffers containing no explicit shadow or reflection information.

  • Gbuffer&Env
    SplitSum
    SSRT
    DiLightNet
    RGB↔X
    Ours
    Reference

  • Gbuffer&Env
    SplitSum
    SSRT
    DiLightNet
    RGB↔X
    Ours
    Reference

  • Gbuffer&Env
    SplitSum
    SSRT
    DiLightNet
    RGB↔X
    Ours
    Reference

Qualitative comparison of forward rendering. Our method generates high-quality inter-reflections and shadows, producing more accurate results than the neural baselines.

Inverse Rendering


The inverse renderer offers a general-purpose solution for de-lighting, producing accurate and temporally consistent scene attributes such as normals, albedo, roughness, and metallicity.

Relighting


We demonstrate the effectiveness of our combined inverse and forward rendering model in the relighting task. We use the estimated G-buffers from the inverse renderer to relight the scene with different lighting conditions.


Source
Target Env. Map
DiLightNet
Neural Gaffer
Ours
Ground Truth

Applications


Relighting Real Scenes


  • Comparison with reconstruction-based inverse rendering and relighting methods


  • Comparison with reconstruction-based inverse rendering and relighting methods


Material Editing

Edited Metallic
Edited Roughness
Our Rendering
Ground Truth

Object Insertion

🖱️ Hover over each image to see the background without the virtual object.

Paper



DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Ruofan Liang*, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang*

description arXiv
description Paper
description Supp Video
description Demo Video

BibTeX


@article{DiffusionRenderer,
    author = {Ruofan Liang and Zan Gojcic and Huan Ling and Jacob Munkberg and 
        Jon Hasselgren and Zhi-Hao Lin and Jun Gao and Alexander Keller and 
        Nandita Vijaykumar and Sanja Fidler and Zian Wang},
    title = {DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models},
    journal = {arXiv preprint arXiv: 2501.18590},
    year = {2025}
}