TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models. In contrast to recent works that leverage 2D text-to-image diffusion models to distill 3D objects using a slow and fragile optimization process, TexFusion introduces a new 3D-consistent generation technique specifically designed for texture synthesis that employs regular diffusion model sampling on different 2D rendered views.

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene.

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos.

Karsten Kreis

Karsten Kreis is a Principal Research Scientist at NVIDIA Research focusing on generative AI.

Karsten's research interests span both the development of foundational generative AI algorithms and their application across scientific and creative domains. Recently, he has been focusing on generative learning for molecular modeling and is leading efforts in generative modeling for protein design.

Generalizable One-shot 3D Neural Head Avatar

We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image. Existing methods either involve time-consuming optimization for a specific person with multiple images, or they struggle to synthesize intricate appearance details beyond the facial region. To address these limitations, we propose a framework that not only generalizes to unseen identities based on a single-view image without requiring person-specific optimization, but also captures characteristic details within and beyond the face area (e.g. hairstyle, accessories, etc.).

Convolutional State Space Models for Long-Range Spatiotemporal Modeling

Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequence, compressed into tokens, in parallel. However, the cost of attention scales quadratically in length, limiting their scalability to longer sequences.

Huck Yang

I am a Sr. Research Scientist at NV Research

I obtained my Ph.D. and M.Sc. from Georgia Institute of Technology, USA with Wallace H. Coulter fellowship and my B.Sc. from National Taiwan University. 

My primary research lies in the area of Multilingual Model Alignments and Speech-Language Modeling. Specifically:

Dale Durran

Durran has a 25% appointment as a Principal Research Scientist in Climate Modeling at NVIDIA and a 60% appointment as a Professor of Atmospheric Sciences at the University of Washington.  At NVIDIA his research focus in on deep learning earth-system modeling for sub-seasonal and seasonal forecasting, forecast ensembles, and generative methods for  fine-scale modeling of convective precipitation and other mesoscale fields.