Computer Vision

Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching

Modern generative models possess a deep understanding of visual content, yet training them for image editing typically requires massive datasets of paired examples. This limits scalability, especially for video editing where collecting paired data is …

GSVisLoc: Generalizable Visual Localization for Gaussian Splatting Scene Representations

Motion by Queries: Identity-Motion Trade-offs in Text-to-Video Generation

TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features

As 3D content creation continues to grow, transferring semantic textures between 3D meshes remains a significant challenge in computer graphics. While recent methods leverage text-to-image diffusion models for texturing, they often struggle to …

Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Real-Time Rate Control for Task-Aware Video Compression Using Reinforcement Learning

Robust Equivariant Multiview Structure from Motion

Multi-Task Learning as a Bargaining Game

In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks. Joint training reduces computation costs and improves data efficiency; however, since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts.

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

Can a generative model be trained to produce images from a specific domain, guided by a text prompt only, without seeing any image? In other words: can an image generator be trained blindly?