Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps

Autonomous driving has traditionally relied heavily on costly and labor-intensive High Definition (HD) maps, hindering scalability. In contrast, Standard Definition (SD) maps are more affordable and have worldwide coverage, offering a scalable alternative. In this work, we systematically explore the effect of SD maps for real-time lane-topology understanding.

Multi-Predictor Fusion: Combining Learning-based and Rule-based Trajectory Predictors

Trajectory prediction modules are key enablers for safe and efficient planning of autonomous vehicles (AVs), particularly in highly interactive traffic scenarios. Recently, learning-based trajectory predictors have experienced considerable success in providing state-of-the-art performance due to their ability to learn multimodal behaviors of other agents from data. In this paper, we present an algorithm called multi-predictor fusion (MPF) that augments the performance of learning-based predictors by imbuing them with motion planners that are tasked with satisfying logic-based rules.

Language Conditioned Traffic Generation

Simulation forms the backbone of modern self-driving development. Simulators help develop, test, and improve driving systems without putting humans, vehicles, or their environment at risk. However, simulators face a major challenge: They rely on realistic, scalable, yet interesting content. While recent advances in rendering and scene reconstruction make great strides in creating static scene assets, modeling their layout, dynamics, and behaviors remains challenging. In this work, we turn to language as a source of supervision for dynamic traffic scene generation.

PAC-Bayes Generalization Certificates for Learned Inductive Conformal Prediction

Inductive Conformal Prediction (ICP) provides a practical and effective approach for equipping deep learning models with uncertainty estimates in the form of set-valued predictions which are guaranteed to contain the ground truth with high probability. Despite the appeal of this coverage guarantee, these sets may not be efficient: the size and contents of the prediction sets are not directly controlled, and instead depend on the underlying model and choice of score function.

trajdata: A Unified Interface to Multiple Human Trajectory Datasets

The field of trajectory forecasting has grown significantly in recent years, partially owing to the release of numerous large-scale, real-world human trajectory datasets for autonomous vehicles (AVs) and pedestrian motion tracking. While such datasets have been a boon for the community, they each use custom and unique data formats and APIs, making it cumbersome for researchers to train and evaluate methods across multiple datasets. To remedy this, we present trajdata: a unified interface to multiple human trajectory datasets.

Interactive AI Material Generation and Editing in NVIDIA Omniverse

We present an AI-based tool for interactive material generation within the NVIDIA Omniverse environment. Our approach leverages a State-of-the-art Latent Diffusion model with some notable modifications to adapt it to the task of material generation. Specifically, we employ circular-padded convolution layers in place of standard convolution layers. This unique adaptation ensures the production of seamless tiling textures, as the circular padding facilitates seamless blending at image edges.

FactorSim: Generative Simulation via Factorized Representation

Generating simulations to train intelligent agents in game-playing and robotics from natural language input, from user input or task documentation, remains an open-ended challenge. Existing approaches focus on parts of this challenge, such as generating reward functions or task hyperparameters. Unlike previous work, we introduce FACTORSIM that generates full simulations in code from language input that can be used to train agents.

Hanrong Ye

Hanrong Ye is currently a research scientist at Nvidia Research, conducting research on multi-task, multi-media, and multi-modality models for machine understanding and generation. Personal Website: Link