UniRelight

Kai He^1,2,3 Ruofan Liang^1,2,3 Jacob Munkberg¹ Jon Hasselgren¹ Nandita Vijaykumar^2,3
Alexander Keller¹ Sanja Fidler^1,2,3 Igor Gilitschenski^{2,3 †} Zan Gojcic^{1 †} Zian Wang^{1,2,3 †}

¹NVIDIA ²University of Toronto ³Vector Institute ^†joint advising

description Paper description Arxiv description Supp Video description Code (Coming Soon) description BibTeX

UniRelight is a relighting framework that jointly models the distribution of scene intrinsics and illumination. It enables high-quality relighting and intrinsic decomposition from a single input image or video, producing temporally consistent shadows, reflections, and transparency, and outperforms state-of-the-art methods.

Abstract

We address the challenge of relighting a single image or video, a task that demands precise scene intrinsic understanding and high-quality light transport synthesis. Existing end-to-end relighting models are often limited by the scarcity of paired multi-illumination data, restricting their ability to generalize across diverse scenes. Conversely, two-stage pipelines that combine inverse and forward rendering can mitigate data requirements but are susceptible to error accumulation and often fail to produce realistic outputs under complex lighting conditions or with sophisticated materials. In this work, we introduce a general-purpose approach that jointly estimates albedo and synthesizes relit outputs in a single pass, harnessing the generative capabilities of video diffusion models. This joint formulation enhances implicit scene comprehension and facilitates the creation of realistic lighting effects and intricate material interactions, such as shadows, reflections, and transparency. Trained on synthetic multi-illumination data and extensive automatically labeled real-world videos, our model demonstrates strong generalization across diverse domains and surpasses previous methods in both visual fidelity and temporal consistency.

Method overview. Given an input video and a target lighting configuration, our method jointly predicts a relit video and its corresponding albedo. We use a pretrained VAE encoder-decoder pair to map input and output videos to a latent space. The latents for the target relit video and albedo are concatenated along the temporal dimension with the encoded input video. Lighting features derived from the environment maps are concatenated along the channel dimension with the relit video latent. A finetuned DiT video model denoises the joint latent, enabling consistent generation of both relit appearance and intrinsic decomposition.

Motivation

Our key insight is to jointly model relighting and albedo estimation. Demodulation provides a strong prior for the relighting task, improving generalization and reducing shadow-baking artifacts.

This joint formulation encourages the model to learn an internal representation of scene structure, leading to improved generalization across diverse and unseen domains.

Joint Estimation

Albedo and relighting joint estimation. Our method produces high-quality albedo and relighting results with realistic specular highlights and shadows under target lighting conditions.

Relighting Results

UniRelight produces high-quality albedo and relighting results with realistic specular highlights and shadows under target lighting conditions on real-world videos.

Comparison

Qualitative comparison on in-the-wild data. Our method generates more plausible results than the baselines, with higher quality and more realistic appearance. Especially, on complex materials-such as anisotropic surfaces, glass, and transparent objects, the previous state-of-the-art work DiffusionRenderer struggles to accurately represent the materials, leading to suboptimal results.

landscape Scene 1 landscape Scene 2 landscape Scene 3 landscape Scene 4 landscape Scene 5

Application: Illumination Augmentation

Our model's strong generalization capability enables effective data augmentation across different scenarios. We show several diverse samples of data generated by our model on driving scenes, including nighttime and dusk scenes, demonstrating that it accurately models the illumination distribution and can sample realistic relighting results under varying lighting conditions.

Paper

UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting

Kai He, Ruofan Liang, Jacob Munkberg, Jon Hasselgren, Nandita Vijaykumar, Alexander Keller, Sanja Fidler, Igor Gilitschenski^†, Zan Gojcic^†, Zian Wang^†

description arXiv

description Paper

description Supp Video

BibTeX

@article{he2025unirelight,
  title={UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting},
  author={He, Kai and Liang, Ruofan and Munkberg, Jacob and Hasselgren, Jon and Vijaykumar, Nandita 
    and Keller, Alexander and Fidler, Sanja and Gilitschenski, Igor and Gojcic, Zan and Wang, Zian},
  journal={arXiv preprint arXiv:2506.15673},
  year={2025}
}

Acknowledgment

The authors thank Tianshi Cao and Huan Ling for their insightful discussions that contributed to this project.