| Research

Jindong is a research scientist in the Learning and Perception Research (LPR) team of NVIDIA Research. Prior to joining NVIDIA, Jindong was a PhD student at Rutgers University under the supervision of Prof. Sungjin Ahn. His research interests lie at the intersection of representation learning and visual reasoning, with a strong interests in developing novel architectures that can improve agent's visual reasoning capabilities.

Read more about Jindong Jiang

Fast Explicit-Input Assistance for Teleoperation in Clutter

Read more about Fast Explicit-Input Assistance for Teleoperation in Clutter

Inference-Time Policy Steering through Human Interactions

Read more about Inference-Time Policy Steering through Human Interactions

Principles and guidelines for evaluating social robot navigation algorithms

Read more about Principles and guidelines for evaluating social robot navigation algorithms

Factory: Fast Contact for Robotic Assembly

Robotic assembly is one of the oldest and most challenging applications of robotics. In other areas of robotics, such as perception and grasping, simulation has rapidly accelerated research progress, particularly when combined with modern deep learning. However, accurately, efficiently, and robustly simulating the range of contact-rich interactions in assembly remains a longstanding challenge. In this work, we present Factory, a set of physics simulation methods and robot learning tools for such applications.

Read more about Factory: Fast Contact for Robotic Assembly

Multi-student Diffusion Distillation for Better One-step Generators

Diffusion models achieve high-quality sample generation at the cost of a lengthy multistep inference procedure. To overcome this, diffusion distillation techniques produce student generators capable of matching or surpassing the teacher in a single step. However, the student model’s inference speed is limited by the size of the teacher architecture, preventing real-time generation for computationally heavy applications. In this work, we introduce Multi-Student Distillation (MSD), a framework to distill a conditional teacher diffusion model into multiple single-step generators.

Read more about Multi-student Diffusion Distillation for Better One-step Generators

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

We introduce Audio-SDS, a generalization of Score Distillation Sampling (SDS) to text-conditioned audio diffusion models. While SDS was initially designed for text-to-3D generation using image diffusion, its core idea of distilling a powerful generative prior into a separate parametric representation extends to the audio domain. Leveraging a single pretrained model, Audio-SDS enables a broad range of tasks without requiring specialized datasets.

Read more about Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. This offers key advantages of (1) leveraging spatial knowledge already embedded in LLMs, derived from textual sources like 3D tutorials, and (2) enabling conversational 3D generation and mesh understanding. A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly.

Read more about LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

ReMatching Dynamic Reconstruction Flow

Reconstructing a dynamic scene from image inputs is a fundamental computer vision task with many downstream applications. Despite recent advancements, existing approaches still struggle to achieve high-quality reconstructions from unseen viewpoints and timestamps. This work introduces the ReMatching framework, designed to improve reconstruction quality by incorporating deformation priors into dynamic reconstruction models.

Read more about ReMatching Dynamic Reconstruction Flow

Subscribe to