NVIDIA Spatial Intelligence Lab NVIDIA Research
UniRelight

UniRelight: Learning Joint Decomposition and Synthesis
for Video Relighting

Kai He1,2,3       Ruofan Liang1,2,3       Jacob Munkberg1       Jon Hasselgren1       Nandita Vijaykumar2,3
Alexander Keller1       Sanja Fidler1,2,3       Igor Gilitschenski2,3 †       Zan Gojcic1 †       Zian Wang1,2,3 †
1NVIDIA       2University of Toronto       3Vector Institute       joint advising

UniRelight is a relighting framework that jointly models the distribution of scene intrinsics and illumination. It enables high-quality relighting and intrinsic decomposition from a single input image or video, producing temporally consistent shadows, reflections, and transparency, and outperforms state-of-the-art methods.

Abstract


We address the challenge of relighting a single image or video, a task that demands precise scene intrinsic understanding and high-quality light transport synthesis. Existing end-to-end relighting models are often limited by the scarcity of paired multi-illumination data, restricting their ability to generalize across diverse scenes. Conversely, two-stage pipelines that combine inverse and forward rendering can mitigate data requirements but are susceptible to error accumulation and often fail to produce realistic outputs under complex lighting conditions or with sophisticated materials. In this work, we introduce a general-purpose approach that jointly estimates albedo and synthesizes relit outputs in a single pass, harnessing the generative capabilities of video diffusion models. This joint formulation enhances implicit scene comprehension and facilitates the creation of realistic lighting effects and intricate material interactions, such as shadows, reflections, and transparency. Trained on synthetic multi-illumination data and extensive automatically labeled real-world videos, our model demonstrates strong generalization across diverse domains and surpasses previous methods in both visual fidelity and temporal consistency.


Method overview. Given an input video and a target lighting configuration, our method jointly predicts a relit video and its corresponding albedo. We use a pretrained VAE encoder-decoder pair to map input and output videos to a latent space. The latents for the target relit video and albedo are concatenated along the temporal dimension with the encoded input video. Lighting features derived from the environment maps are concatenated along the channel dimension with the relit video latent. A finetuned DiT video model denoises the joint latent, enabling consistent generation of both relit appearance and intrinsic decomposition.

Motivation


Our key insight is to jointly model relighting and albedo estimation. Demodulation provides a strong prior for the relighting task, improving generalization and reducing shadow-baking artifacts.

This joint formulation encourages the model to learn an internal representation of scene structure, leading to improved generalization across diverse and unseen domains.

Joint Estimation


Albedo and relighting joint estimation. Our method produces high-quality albedo and relighting results with realistic specular highlights and shadows under target lighting conditions.

Relighting Results


UniRelight produces high-quality albedo and relighting results with realistic specular highlights and shadows under target lighting conditions on real-world videos.

Comparison


Qualitative comparison on in-the-wild data. Our method generates more plausible results than the baselines, with higher quality and more realistic appearance. Especially, on complex materials-such as anisotropic surfaces, glass, and transparent objects, the previous state-of-the-art work DiffusionRenderer struggles to accurately represent the materials, leading to suboptimal results.

Application: Illumination Augmentation


Our model's strong generalization capability enables effective data augmentation across different scenarios. We show several diverse samples of data generated by our model on driving scenes, including nighttime and dusk scenes, demonstrating that it accurately models the illumination distribution and can sample realistic relighting results under varying lighting conditions.




Paper



UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting

Kai He, Ruofan Liang, Jacob Munkberg, Jon Hasselgren, Nandita Vijaykumar, Alexander Keller, Sanja Fidler, Igor Gilitschenski, Zan Gojcic, Zian Wang

description arXiv
description Paper
description Supp Video

BibTeX


@misc{he2025unirelight,
    title={UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting},
    author={Kai He and Ruofan Liang and Jacob Munkberg and Jon Hasselgren and Nandita Vijaykumar 
        and Alexander Keller and Sanja Fidler and Igor Gilitschenski and Zan Gojcic and Zian Wang},
    year={2025},
    eprint={2506.15673},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgment


The authors thank Tianshi Cao and Huan Ling for their insightful discussions that contributed to this project.