HorizonRelight: Relighting Long-horizon Videos Consistently via Diffusion Transformers
1NVIDIA 2University of Southern California *Corresponding author
ECCV 2026
Overview Video
Abstract
Diffusion-based video relighting enables controllable relighting from a single input video, but modern video diffusion backbones are trained on short clips and applied to long-horizon videos through chunked sliding-window inference, often causing temporal discontinuities at chunk boundaries. We address this by reframing long-horizon relighting as temporally conditioned latent domain translation. Our framework enforces cross-chunk continuity by propagating target-domain latents across boundaries and makes this behavior learnable using masked target-domain self-conditioning, training the model to continue from temporally masked propagated context. We further introduce warm-start prompting with a relit prompt anchor from a controllable generative model, which establishes the initial target-domain state and creates a general interface for prompt-based relighting. Experiments on in-the-wild long-horizon videos show markedly improved temporal consistency, with chunk-boundary artifacts largely reduced and unwanted appearance changes across chunks greatly suppressed.
Method
Each sliding-window chunk is conditioned on target-domain latents propagated from the previous chunk, so the relit appearance is continued instead of re-inferred independently.
Results
Prompt-based relighting and long-horizon continuation produce stable relit videos under diverse target lighting while preserving source content and temporal consistency.
Comparison on Temporal Consistency
Temporal comparisons evaluate consistency for both inverse decomposition and forward relighting across long-horizon video segments.
Video Relighting
We use Nano Banana Pro to generate a warm-start prompt anchor for the first frame, then propagate the relit target-domain state across the video under diverse target illuminations.
Editing
We use Nano Banana Pro to create edited first-frame warm starts, and HorizonRelight propagates the prompted style while maintaining long-horizon temporal coherence.
Ablation on Stretched Warm-start Prompting
Nano Banana Pro warm-start images can be spatially stretched relative to the predicted structural cues; this ablation compares disabling or enabling the prompting-frame G-buffer for the warm-start frame.
BibTeX
@inproceedings{yang2026horizonrelight,
title = {HorizonRelight: Relighting Long-horizon Videos Consistently via Diffusion Transformers},
author = {Yang, Jing and Jaiswal, Mayoore and Wang, Zian and Zeng, Steven and Pereira, Rochelle and Zhao, Yajie and Min, Jianyuan},
booktitle = {European Conference on Computer Vision (ECCV)},
eprint = {2606.29095},
archivePrefix = {arXiv},
url = {https://arxiv.org/abs/2606.29095},
year = {2026}
}