ArtiFixer: Enhancing and Extending 3D Reconstruction
with Auto-Regressive Diffusion Models

¹NVIDIA

²ETH Zurich

³Cornell University

⁴University of Toronto

⁵Vector Institute

* Equal Contribution

SIGGRAPH 2026

article Paper format_quote BibTeX Code 🤗 Hugging Face

3DGUT

ArtiFixer3D+

Drag the handle to compare 3DGUT (left) and ArtiFixer3D+ (right) on the same novel-view trajectory.

Abstract

Per-scene optimization methods such as 3D Gaussian Splatting provide state-of-the-art novel view synthesis quality but extrapolate poorly to under-observed areas. Methods that leverage generative priors to correct artifacts in these areas hold promise but currently suffer from two shortcomings. The first is scalability, as existing methods use image diffusion models or bidirectional video models that are limited in the number of views they can generate in a single pass (and thus require a costly iterative distillation process for consistency). The second is quality itself, as generators used in prior work tend to produce outputs that are inconsistent with existing scene content and fail entirely in completely unobserved regions.

To solve these, we propose a two-stage pipeline that leverages two key insights. First, we train a powerful bidirectional generative model with a novel opacity mixing strategy that encourages consistency with existing observations while retaining the model's ability to extrapolate novel content in unseen areas. Second, we distill it into a causal auto-regressive model that generates hundreds of frames in a single pass. This model can directly produce novel views or serve as pseudo-supervision to improve the underlying 3D representation in a simple and highly efficient manner. We evaluate our method extensively and demonstrate that it can generate plausible reconstructions in scenarios where existing approaches fail completely. When measured on commonly benchmarked datasets, we outperform all existing baselines by a wide margin, exceeding prior state-of-the-art methods by 1–3 dB PSNR.

Method

ArtiFixer is a two-stage pipeline. In Phase I, we finetune a bidirectional video diffusion model using an opacity mixing strategy: rather than starting from pure noise or directly from degraded renderings, we encode the input RGB into latent space and mix with Gaussian noise using the rendered opacity maps. This encourages the model to remain consistent with existing scene content while retaining full generative capability in unseen regions. We additionally inject fine-grained opacity information and camera control signals, along with clean reference views and an optional text prompt.

In Phase II, we distill the bidirectional teacher into a causal autoregressive model via Self-Forcing-style DMD distillation. The resulting model generates hundreds of frames in a single pass, which can be used directly for novel view synthesis or as pseudo-supervision to improve the underlying 3D representation.

MipNeRF 360 Comparisons

We render novel orbit trajectories and compare ArtiFixer3D+ to its base 3DGUT rendering, GenFusion, and GSFixer on all scenes in MipNeRF 360's most challenging 3-view split. Our quality exceeds, to our knowledge, all other previously published work.

DL3DV Comparisons

We compare our method to a variety of generative baselines on sparse reconstructions from the DL3DV-10K dataset.

15ff83
032dee

We compare ArtiFixer3D+ on DL3DV to 3DGUT and two baselines that build upon bidirectional video diffusion models. GenFusion's base model generates 16 frames at a time, requiring an iterative distillation process that leads to blurry results, especially in empty areas. Gen3C's renderings are sharper but often do not respect the source content. Our method reconstructs plausible and consistent geometry even when the initial rendering is highly degraded.

Nerfbusters Comparisons

As in the other datasets, our method is the only one that can generate plausible visuals in unobserved areas while respecting source fidelity.

Conditioning

We drop the initial rendering condition, forcing the model to reconstruct the scene from the reference views. Although fidelity drops somewhat, the high-level structure of the scene remains intact along with the correct camera motion.

Prediction

Ground Truth

Reference Views

ArtiFixer retains a strong generative ability thanks to opacity mixing and training dropout, and is able to generate videos from text prompts alone, similar to its base model.

Prompt: A bronze statue of two children standing back to back on a stone pedestal inside a modern exhibition hall. The taller child, wearing a dress and carrying a backpack, has one hand resting on the smaller child's shoulder…

ArtiFixer Variants

We evaluate three variants: ArtiFixer, which directly renders novel views from the auto-regressive generator; ArtiFixer3D, which distills its outputs back into the underlying 3D representation; and ArtiFixer3D+, which re-applies the auto-regressive model as post-processing on top of ArtiFixer3D (as in Difix3D+). All variants produce similar renderings: ArtiFixer's are slightly sharper, ArtiFixer3D's are more consistent with source images at the cost of some blurriness, and ArtiFixer3D+ restores sharpness while remaining highly consistent.

Denoising Steps

As our method starts from renderings instead of pure noise, it can generate plausible visuals in fewer than four steps in most cases, though sharpness and temporal consistency suffer somewhat in empty areas. We compare different denoising step counts on a slightly shifted trajectory distilled into the representation (ArtiFixer3D). Rendering is generally stable across denoising counts except for minor changes near previously unexplored periphery.

Citation

@inproceedings{delutio2026artifixer,
    title={ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models},
    author={de Lutio, Riccardo and Fischer, Tobias and Chang, Yen-Yu and Zhang, Yuxuan and
            Wu, Jay Zhangjie and Ren, Xuanchi and Shen, Tianchang and Tothova, Katarina and
            Gojcic, Zan and Turki, Haithem},
    booktitle={SIGGRAPH},
    year={2026}
}

ArtiFixer: Enhancing and Extending 3D Reconstructionwith Auto-Regressive Diffusion Models