| Research

Jae-Hyun Jung

Jae-Hyun Jung joined NVIDIA Research in March 2024, exploring the interaction between human perception and computational systems. His recent research interests include applied human perception and modeling in walking or driving behavior, visual perception and optical devices in AR/VR, and Human Computer Interaction applications.

Read more about Jae-Hyun Jung

Quantum Computing

Read more about Quantum Computing

Generating images of rare concepts using pre-trained diffusion models

Text-to-image diffusion models can synthesize high-quality images, but they have various limitations. Here we highlight a common failure mode of these models, namely, generating uncommon concepts and structured concepts like hand palms. We show that their limitation is partly due to the long-tail nature of their training data: web-crawled data sets are strongly unbalanced, causing models to under-represent concepts from the tail of the distribution. We characterize the effect of unbalanced training data on text-to-image models and offer a remedy.

Read more about Generating images of rare concepts using pre-trained diffusion models

Equivariant Architectures for Learning in Deep Weight Spaces

Designing machine learning architectures for processing neural networks in their raw weight matrix form is a newly introduced research direction. Unfortunately, the unique symmetry structure of deep weight spaces makes this design very challenging. If successful, such architectures would be capable of performing a wide range of intriguing tasks, from adapting a pre-trained network to a new domain to editing objects represented as functions (INRs or NeRFs). As a first step towards this goal, we present here a novel network architecture for learning in deep weight spaces.

Read more about Equivariant Architectures for Learning in Deep Weight Spaces

Guided Deep Kernel Learning

Combining Gaussian processes with the expressive power of deep neural networks is commonly done nowadays through deep kernel learning (DKL). Unfortunately, due to the kernel optimization process, this often results in losing their Bayesian benefits. In this study, we present a novel approach for learning deep kernels by utilizing infinite-width neural networks. We propose to use the Neural Network Gaussian Process (NNGP) model as a guide to the DKL model in the optimization process.

Read more about Guided Deep Kernel Learning

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based domain-tuning approach.

Read more about Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Graph Positional Encoding via Random Feature Propagation

Two main families of node feature augmentation schemes have been explored for enhancing GNNs: random features and spectral positional encoding. Surprisingly, however, there is still no clear understanding of the relation between these two augmentation schemes. Here we propose a novel family of positional encoding schemes which draws a link between the above two approaches and improves over both. The new approach, named Random Feature Propagation (RFP), is inspired by the power iteration method and its generalizations.

Read more about Graph Positional Encoding via Random Feature Propagation

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times. However, most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts.

Read more about Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Breathing Life Into Sketches Using Text-to-Video Priors

A sketch is one of the most intuitive and versatile tools humans use to convey their ideas visually. An animated sketch opens another dimension to the expression of ideas and is widely used by designers for a variety of purposes. Animating sketches is a laborious process, requiring extensive experience and professional design skills. In this work, we present a method that automatically adds motion to a single-subject sketch (hence, "breathing life into it"), merely by providing a text prompt indicating the desired motion.

Read more about Breathing Life Into Sketches Using Text-to-Video Priors

Xin Dong

Xin Dong received his Ph.D. from Harvard University. He is a recipient of the Harvard James Miller Peirce Fellowship.

He has general research interests in deep learning, with a focus on designing accurate, efficient and trustworthy systems for autonomous machines, LLM and GenAI.

Read more about Xin Dong

Subscribe to