Difix3D+ corrects NeRF and 3DGS artifacts in underconstrained regions, enhancing overall 3D representation quality.
Neural Radiance Fields and 3D Gaussian Splatting have revolutionized 3D reconstruction and novel-view synthesis task. However, achieving photorealistic rendering from extreme novel viewpoints remains challenging, as artifacts persist across representations. In this work, we introduce Difix3D+, a novel pipeline designed to enhance 3D reconstruction and novel-view synthesis through single-step diffusion models. At the core of our approach is Difix, a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by underconstrained regions of the 3D representation. Difix serves two critical roles in our pipeline. First, it is used during the reconstruction phase to clean up pseudo-training views that are rendered from the reconstruction and then distilled back into 3D. This greatly enhances underconstrained regions and improves the overall 3D representation quality. More importantly, Difix also acts as a neural enhancer during inference, effectively removing residual artifacts arising from imperfect 3D supervision and the limited capacity of current reconstruction models. Difix3D+ is a general solution, a single model compatible with both NeRF and 3DGS representations, and it achieves an average 2$\times$ improvement in FID score over baselines while maintaining 3D consistency.
Blue Cameras: Training Views;
Red Cameras: Target Views;
Orange Cameras: Intermediate Novel views along the progressive 3D updating trajectory.
Step 1: Given a pretrained 3D representation, we render novel views and feed them to Difix which acts as a neural enhancer, removing the artifacts and improving the quality of the noisy rendered views. The camera poses selected to render the novel views are obtained through pose interpolation, gradually approaching the target poses from the reference ones.
Step 2: The cleaned novel views are distilled back to the 3D representation to improve its quality. Steps 1 and 2 are applied in several iterations to progressively grow the spatial extent of the reconstruction and hence ensure strong conditioning of the diffusion model.
Step 3: Difix additionally acts as a real-time neural enhancer, further improving the quality of the rendered novel views.
Difix is a single-step diffusion model finetuned from Stable Diffusion. It takes a noisy rendered image and a reference views as input (left), and outputs an enhanced version of the input image with reduced artifacts (right).
@article{wu2025difix3d,
title={Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models},
author={Jay Zhangjie Wu and Yuxuan Zhang and Haithem Turki and Xuanchi Ren and Jun Gao and Mike Zheng Shou and Sanja Fidler and Zan Gojcic and Huan Ling},
journal={arXiv preprint arXiv: 2503.01774}
year={2025}
}