ChronoEdit:
Towards Temporal Reasoning for Image Editing and World Simulation

¹ NVIDIA ² University of Toronto

* equal contribution † corresponding author

TL;DR: ChronoEdit reframes image editing as an video generation task to encourage temporal consistency. It leverages a temporal reasoning stage that denoises with “temporal reasoning tokens” to “reason” on physically plausible edits.

Gallery

Image Editing Results

Hover to see the edited result

Temporal Reasoning Visualization

Hover to see the temporal reasoning trajectory and edited result

Built upon a video model, ChronoEdit can visualize how it “reasons” an edit by denoising the temporal reasoning tokens, revealing the editing trajectory behind its final output.

We introduce temporal reasoning tokens between the reference and edited image latents, serving as intermediate guidance that helps the model “think” through plausible editing trajectories. At inference, these tokens need not be fully denoised for efficiency; however, in the results shown below, we optionally denoise them into a clean video to visualize how the model reasons and interprets an editing task. Note that, in the context of image editing, the final frame of each video is the output edited image.

Physical AI Related Tasks

Hover to see the edited result

ChronoEdit produces edits that faithfully follow physical consistency, which is especially critical for Physical-AI–related scenes (such as for autonomous vehicles or humanoids).

Method

Overview of the ChronoEdit pipeline. From right to left, the denoising process begins in the temporal reasoning stage, where the model imagines and denoises a short trajectory of intermediate frames. These intermediate frames act as reasoning tokens, guiding how the edit should unfold in a physically consistent manner. For efficiency, the reasoning tokens are discarded in the subsequent editing frame generation stage, where the target frame is further refined into the final edited image.

ChronoEdit:
Towards Temporal Reasoning for Image Editing and World Simulation

Gallery

Image Editing Results

Temporal Reasoning Visualization

Physical AI Related Tasks

Method

Acknowledgments

Citation

ChronoEdit:Towards Temporal Reasoning for Image Editing and World Simulation

Gallery

Image Editing Results

Temporal Reasoning Visualization

Physical AI Related Tasks

Method

Acknowledgments

Citation

ChronoEdit:
Towards Temporal Reasoning for Image Editing and World Simulation