| Research

Lars Johannsmeier

I am a research scientist at the Seattle Robotics Lab. I obtained my PhD from the Technical University Munich under the supervision of Prof. Sami Haddadin. Before joining NVIDIA, I was the head of the AI department at Franka Robotics GmbH, the creator of the most popular research robot worldwide. My research at NVIDIA focuses on two main aspects. First, how to design intelligent robotic systems such that they are deployable in the real world. Second, how to model manipulation such that robots can solve complex tasks with similar performance and robustness as humans.

Read more about Lars Johannsmeier

scene_synthesizer: A Python Library for Procedural Scene Generation in Robot Manipulation

Read more about scene_synthesizer: A Python Library for Procedural Scene Generation in Robot Manipulation

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

Diffusion models have opened the path to a wide range of text-based image editing frameworks. However, these typically build on the multi-step nature of the diffusion backwards process, and adapting them to distilled, fast-sampling methods has proven surprisingly challenging. Here, we focus on a popular line of text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.

Read more about TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

LCM-Lookahead for Encoder-based Text-to-Image Personalization

Recent advancements in diffusion models have introduced fast sampling methods that can effectively produce high-quality images in just one or a few denoising steps. Interestingly, when these are distilled from existing diffusion models, they often maintain alignment with the original model, retaining similar outputs for similar prompts and seeds. These properties present opportunities to leverage fast sampling methods as a shortcut-mechanism, using them to create a preview of denoised outputs through which we can backpropagate image-space losses.

Read more about LCM-Lookahead for Encoder-based Text-to-Image Personalization

Consolidating Attention Features for Multi-view Image Editing

Large-scale text-to-image models enable a wide range of image editing techniques, using text prompts or even spatial controls. However, applying these editing methods to multi-view images depicting a single scene leads to 3D-inconsistent results. In this work, we focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views.

Read more about Consolidating Attention Features for Multi-view Image Editing

Enze Xie

Xie Enze is a Senior Research Scientist at NVIDIA Research. Previously, he was a Principal Researcher and Research Lead at Huawei Noah's Ark Lab (Hong Kong). He obtained his PhD from HKU MMLab in 2022. His current research focuses mainly on multimodal generation, understanding, and acceleration.

Read more about Enze Xie

Johan Bjorck

Read more about Johan Bjorck

Kieran Didi

Read more about Kieran Didi

Hugo Hadfield

Hugo Hadfield is a Senior Robotics Research Software Engineer at NVIDIA. He completed his PhD at the University of Cambridge in the area of geometric methods for computer vision and robotics and subsequently worked in industry on localization, calibration, dataset generation and real-time control for end-to-end-learnt self driving cars. At NVIDIA his research focuses on the development of novel techniques across the spectrum of modern dexterous and mobile robotics as well as their productionization and deployment in real-world, real-time, scenarios.

Read more about Hugo Hadfield

Yan He

Read more about Yan He

Subscribe to