HorizonRelight: Relighting Long-horizon Videos Consistently via Diffusion Transformers

Jing Yang^1,2 Mayoore Jaiswal¹ Zian Wang¹ Steven Zeng¹ Rochelle Pereira¹ Yajie Zhao² Jianyuan Min^1,*

¹NVIDIA ²University of Southern California ^*Corresponding author

ECCV 2026

article Paper description arXiv code Code (Coming soon) format_quote BibTeX

HorizonRelight relighting examples across long-horizon videos. — HorizonRelight delivers temporally consistent long-horizon relighting by propagating target-domain context across sliding-window chunks.

Overview Video

Abstract

Diffusion-based video relighting enables controllable relighting from a single input video, but modern video diffusion backbones are trained on short clips and applied to long-horizon videos through chunked sliding-window inference, often causing temporal discontinuities at chunk boundaries. We address this by reframing long-horizon relighting as temporally conditioned latent domain translation. Our framework enforces cross-chunk continuity by propagating target-domain latents across boundaries and makes this behavior learnable using masked target-domain self-conditioning, training the model to continue from temporally masked propagated context. We further introduce warm-start prompting with a relit prompt anchor from a controllable generative model, which establishes the initial target-domain state and creates a general interface for prompt-based relighting. Experiments on in-the-wild long-horizon videos show markedly improved temporal consistency, with chunk-boundary artifacts largely reduced and unwanted appearance changes across chunks greatly suppressed.

Method

Each sliding-window chunk is conditioned on target-domain latents propagated from the previous chunk, so the relit appearance is continued instead of re-inferred independently.

HorizonRelight diffusion renderer with propagated context.

Results

Prompt-based relighting and long-horizon continuation produce stable relit videos under diverse target lighting while preserving source content and temporal consistency.

Comparison on Temporal Consistency

Temporal comparisons evaluate consistency for both inverse decomposition and forward relighting across long-horizon video segments.

Video Relighting

We use Nano Banana Pro to generate a warm-start prompt anchor for the first frame, then propagate the relit target-domain state across the video under diverse target illuminations.

Editing

We use Nano Banana Pro to create edited first-frame warm starts, and HorizonRelight propagates the prompted style while maintaining long-horizon temporal coherence.

Ablation on Stretched Warm-start Prompting

Nano Banana Pro warm-start images can be spatially stretched relative to the predicted structural cues; this ablation compares disabling or enabling the prompting-frame G-buffer for the warm-start frame.

BibTeX

@inproceedings{yang2026horizonrelight,
  title     = {HorizonRelight: Relighting Long-horizon Videos Consistently via Diffusion Transformers},
  author    = {Yang, Jing and Jaiswal, Mayoore and Wang, Zian and Zeng, Steven and Pereira, Rochelle and Zhao, Yajie and Min, Jianyuan},
  booktitle = {European Conference on Computer Vision (ECCV)},
  eprint    = {2606.29095},
  archivePrefix = {arXiv},
  url       = {https://arxiv.org/abs/2606.29095},
  year      = {2026}
}