Computer Vision

OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models

TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features

As 3D content creation continues to grow, transferring semantic textures between 3D meshes remains a significant challenge in computer graphics. While recent methods leverage text-to-image diffusion models for texturing, they often struggle to …

Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models

Diffusion inversion is the problem of taking an image and a text prompt that describes it and finding a noise latent that would generate the exact same image. Most current deterministic inversion techniques operate by approximately solving an …

Fast Encoder-Based 3D from Casual Videos via Point Track Processing

This paper addresses the long-standing challenge of reconstructing 3D structures from videos with dynamic content. Current approaches to this problem were not designed to operate on casual videos recorded by standard cameras or require a long …

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Abstract Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times.

Learning to Initiate and Reason in Event-Driven Cascading Processes

We describe “Cascade”, a new counterfactual reasoning setup. An agent is provided a semantic instruction and the results of a played out dynamical system. Its goal is to intervene in the dynamic environment, triggering a cascade of events that will lead to a different and counterfactual outcome.

Point-Cloud Completion with Pretrained Text-to-image Diffusion Models

Abstract Point-cloud data collected in real-world applications are often incomplete, because objects are being observed from specific viewpoints, which only capture one perspective. Data can also be incomplete due to occlusion and low-resolution sampling.

Key-Locked Rank One Editing for Text-to-Image Personalization

Summary: We present Perfusion, a new text-to-image personalization method. With only a 100KB model size, trained for roughly 4 minutes, Perfusion can creatively portray personalized objects. It allows significant changes in their appearance, while maintaining their identity, using a novel mechanism we call “Key-Locking”.

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Summary: We use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps. Abstract: Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Abstract Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.