Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
Add‑it is a training‑free approach for inserting objects into images from a simple text prompt. It extends diffusion model attention to incorporate information from three sources—the source image, the text prompt, and the generated image—using a weighted attention scheme, subject‑guided latent blending, and a noise structure transfer step.
Add‑it achieves state‑of‑the‑art results on real and generated image insertion benchmarks, and introduces the “Additing Affordance Benchmark” to evaluate object placement plausibility. The method produces realistic placements while preserving scene structure and fine details.