Toronto AI Lab NVIDIA Research
DiIPIR

Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

1 NVIDIA
2 University of Toronto
3 Vector Institute
ECCV 2024
🖱️ Hover over each image to see the background without the virtual object.
Virtual Object Insertion
Applications
A vase with dried flowers
A black SUV car
Insert multiple objects
init insertion toned bg optimized
A metal bucket
A white SUV car
Optimize material & tone mapping

This work present Diffusion Prior for Inverse Rendering (DiPIR), a physically based method to recover lighting from a single image, enabling arbitrary virtual object compositing into indoor and outdoor scenes, as well as material and tone-mapping optimization.

Abstract


The correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials, as well as the image formation process. While recent large-scale diffusion models have shown strong generative and inpainting capabilities, we find that current models do not sufficiently “understand” the scene shown in a single picture to generate consistent lighting effects (shadows, bright reflections, etc.) while preserving the identity and details of the composited object. We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process. Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes. Our physically based pipeline further enables automatic materials and tone-mapping refinement.

Method overview. Given an input image, we first construct a virtual 3D scene with a virtual object and proxy plane. Our physically-based renderer then differentiably simulates the interactions of the optimizable environment map with the inserted virtual object and its effect on the background scene (shadowing) (left). At each iteration, the rendered image is diffused and passed through a personalized diffusion model (middle). The gradient of the adapted Score Distillation formulation is propagated back to the environment map and the tone-mapping curve through the differentiable renderer. Upon convergence, we recover lighting and tone-mapping parameters, which allow photorealistic compositing of virtual objects from a single image (right).

Results


We demonstrate the effectiveness of our method on a variety of indoor and outdoor scenes. We use Waymo outdoor driving scenes and unwrapped indoor HDRI panoramas as our target background images for evaluation. Our method can more accurately estimate the lighting conditions for the virtual 3D objects to be inserted into the background images.

Optimization Process

Our diffusion-guided lighting optimization process for the inserted virtual object in the Waymo scene.

Visual Comparison on Insertions into Waymo Scenes

Hold-Geoffroy et al.
NLFE
StyleLight
DiffusionLight
DiPIR
Hold-Geoffroy et al.
NLFE
StyleLight
DiffusionLight
DiPIR

Visual Comparison on Insertions into HDRIs

InvRend3D
StyleLight
DiffusionLight
DiPIR
Reference
InvRend3D
StyleLight
DiffusionLight
DiPIR
Reference

Animating Inserted Virtual Object

We either animate the background image or move object position to create dynamic scenes.

Virtual Object Insertion in Multiple Views

We extend the insertion into multiple camera views from Waymo scenes

Applications


We use our method to optimize differentiable material properties from inserted objects.

We use our method to optimize differentiable tone-mapping curves to improve the realism.

Paper


Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

Ruofan Liang, Zan Gojcic, Merlin Nimier-David, David Acuna, Nandita Vijaykumar, Sanja Fidler, Zian Wang

description arXiv [TBA]
description Paper
description BibTeX

Citation


@article{liang2024photorealistic,
    author = {Ruofan Liang and Zan Gojcic and Merlin Nimier-David and David Acuna 
              and Nandita Vijaykumar and Sanja Fidler and Zian Wang},
    title = {Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering},
    journal = {arXiv preprint},
    year = {2024}
}