DiffusionRenderer is a general-purpose method for both neural inverse and forward rendering.
From input images
or videos, it accurately estimates geometry and material buffers, and generates photorealistic images under specified lighting conditions,
offering fundamental tools for image editing applications.
Abstract
Understanding and modeling lighting effects are fundamental tasks in computer vision and graphics. Classic physically-based rendering (PBR)
accurately simulates the light transport, but relies on precise scene representations—explicit 3D geometry, high-quality material properties,
and lighting conditions—that are often impractical to obtain in real-world scenarios. Therefore, we introduce DiffusionRenderer, a neural
approach that addresses the dual problem of inverse and forward rendering within a holistic framework. Leveraging powerful video
diffusion model priors, the inverse rendering model accurately estimates G-buffers from real-world videos, providing an interface for image
editing tasks, and training data for the rendering model. Conversely, our rendering model generates photorealistic images from G-buffers without
explicit light transport simulation. Experiments demonstrate that DiffusionRenderer effectively approximates inverse and forwards rendering,
consistently outperforming the state-of-the-art. Our model enables practical applications from a single video input—including relighting,
material editing, and realistic object insertion.
Method overview.
Given an input video, the neural inverse renderer estimates geometry and material properties per pixel. It generates one scene attribute at a time, with the domain embedding indicating the target attributes to generate.
Conversely, the neural forward renderer produces photorealistic images given lighting information, geometry, and material buffers. The lighting condition is injected into the base video diffusion model through cross-attention layers. During joint training with both synthetic and real data, we use an optimizable LoRA for real data sources.
Motivation
DiffusionRenderer is motivated to consider the inverse and forward
rendering problems jointly, overcoming limitations of classic physically-based rendering (PBR) methods.
Classic PBR relies on explicit 3D geometry such as meshes. When it is not available, screen space ray tracing (SSRT) struggles to accurately represent shadows and reflections. Our forward renderer synthesizes photorealistic lighting effects without explicit path tracing and 3D geometry.
PBR is also sensitive to errors in G-buffers. SSRT with estimated G-buffers from state-of-the-art inverse rendering models often fails to deliver quality results. Our forward renderer is trained to tolerate noisy buffers.
Forward Rendering
Video generation from G-buffers.
The forward renderer generates accurate shadows and reflections that are consistent across viewpoints. Notably, these lighting effects are synthesized entirely from an environment map, despite the input G-buffers containing no explicit shadow or reflection information.
Gbuffer&Env
SplitSum
SSRT
DiLightNet
RGB↔X
Ours
Reference
Gbuffer&Env
SplitSum
SSRT
DiLightNet
RGB↔X
Ours
Reference
Gbuffer&Env
SplitSum
SSRT
DiLightNet
RGB↔X
Ours
Reference
Qualitative comparison of forward rendering. Our method generates high-quality inter-reflections and shadows, producing more accurate results than the neural baselines.
Inverse Rendering
The inverse renderer offers a general-purpose solution for de-lighting, producing accurate and temporally consistent scene attributes such as normals, albedo, roughness, and metallicity.
Relighting
We demonstrate the effectiveness of our combined inverse and forward rendering model in the relighting task.
We use the estimated G-buffers from the inverse renderer to relight the scene with different lighting conditions.
@article{DiffusionRenderer,
author = {Ruofan Liang and Zan Gojcic and Huan Ling and Jacob Munkberg and
Jon Hasselgren and Zhi-Hao Lin and Jun Gao and Alexander Keller and
Nandita Vijaykumar and Sanja Fidler and Zian Wang},
title = {DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models},
journal = {arXiv preprint arXiv: 2501.18590},
year = {2025}
}