LCM-Lookahead for Encoder-based Text-to-Image Personalization

Recent advancements in diffusion models have introduced fast sampling methods that can effectively produce high-quality images in just one or a few denoising steps. Interestingly, when these are distilled from existing diffusion models, they often maintain alignment with the original model, retaining similar outputs for similar prompts and seeds. These properties present opportunities to leverage fast sampling methods as a shortcut-mechanism, using them to create a preview of denoised outputs through which we can backpropagate image-space losses.

Consolidating Attention Features for Multi-view Image Editing

Large-scale text-to-image models enable a wide range of image editing techniques, using text prompts or even spatial controls. However, applying these editing methods to multi-view images depicting a single scene leads to 3D-inconsistent results. In this work, we focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views.

Enze Xie

Xie Enze is a Senior Research Scientist at NVIDIA Research. Previously, he was a Principal Researcher and Research Lead at Huawei Noah's Ark Lab (Hong Kong). He obtained his PhD from HKU MMLab in 2022. His current research focuses mainly on multimodal generation, understanding, and acceleration.  

Hugo Hadfield

Hugo Hadfield is a Senior Robotics Research Software Engineer at NVIDIA. He completed his PhD at the University of Cambridge in the area of geometric methods for computer vision and robotics and subsequently worked in industry on localization, calibration, dataset generation and real-time control for end-to-end-learnt self driving cars. At NVIDIA his research focuses on the development of novel techniques across the spectrum of modern dexterous and mobile robotics as well as their productionization and deployment in real-world, real-time, scenarios.

Reconstructing Translucent Thin Objects from Photos

The joint reconstruction of shape and appearance for translucent objects from real-world data poses a challenge in computer graphics, especially when dealing with complex layered materials like leaves or paper. The traditional assumption of diffuse transmittance falls short, and more accurate Monte-Carlo-based models are often needed to reproduce their appearance. To accurately capture the translucent appearance, an acquisition system needs to be carefully designed.