Syntactic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment

Text-conditioned image generation models often generate incorrect associations between entities and their visual attributes. This reflects an impaired mapping between linguistic binding of entities and modifiers in the prompt and visual binding of the corresponding elements in the generated image. As one notable example, a query like ``a yellow tomato and a red lemon'' may incorrectly produce an image of a yellow lemon and a red tomato.

Online Overexposed Pixels Hallucination in Videos with Adaptive Reference Frame Selection

Low dynamic range (LDR) cameras cannot deal with wide dynamic range inputs, frequently leading to local overexposure issues. We present a learning-based system to reduce these artifacts without resorting to complex acquisition mechanisms like alternating exposures or costly processing that are typical of high dynamic range (HDR) imaging. We propose a transformer-based deep neural network (DNN) to infer the missing HDR details. In an ablation study, we show the importance of using a multiscale DNN and train it with the proper cost function to achieve state-of-the-art quality.

Rethinking Display Requirements for Esports and High Interactivity Applications

Media technology is continuing its transition from passive streaming to participatory interactive experiences, including well-known applications such as web browsing, video conferencing and gaming, as well as emerging and more demanding uses like AR/MR/VR and esports. How should display traits such as latency, refresh rate and size change to meet this trend? We review recent studies from NVIDIA Research and others on requirements for esports as the cutting edge of this trend toward interactivity, and discuss the studies’ implications for other interactive applications.

Verification and Synthesis of Robust Control Barrier Functions: Multilevel Polynomial Optimization and Semidefinite Relaxation

We study the problem of verification and synthesis of robust control barrier functions (CBF) for control-affine polynomial systems with bounded additive uncertainty and convex polynomial constraints on the control. We first formulate robust CBF verification and synthesis as multilevel polynomial optimization problems (POP), where verification optimizes – in three levels – the uncertainty, control, and state, while synthesis additionally optimizes the parameter of a chosen parametric CBF candidate.

Interpretable Trajectory Prediction for Autonomous Vehicles via Counterfactual Responsibility

The ability to anticipate surrounding agents’ behaviors is critical to enable safe and seamless autonomous vehicles (AVs). While phenomenological methods have successfully predicted future trajectories from scene context, these predictions lack interpretability. On the other hand, ontological approaches assume an underlying structure able to describe the interaction dynamics or agents’ internal decision processes. Still, they often suffer from poor scalability or cannot reflect diverse human behaviors.

Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

Recently, reward-conditioned reinforcement learning (RCRL) has gained popularity due to its simplicity, flexibility, and off-policy nature. However, we will show that current RCRL approaches are fundamentally limited and fail to address two critical challenges of RCRL – improving generalization on high reward-to-go (RTG) inputs, and avoiding out-of-distribution (OOD) RTG queries during testing time. To address these challenges when training vanilla RCRL architectures, we propose Bayesian Reparameterized RCRL (BR-RCRL), a novel set of inductive biases for RCRL inspired by Bayes’ theorem.

Refining Obstacle Perception Safety Zones via Maneuver-Based Decomposition

A critical task for developing safe autonomous driving stacks is to determine whether an obstacle is safety-critical, i.e., poses an imminent threat to the autonomous vehicle. Our previous work showed that Hamilton Jacobi reachability theory can be applied to compute interaction-dynamics-aware perception safety zones that better inform an ego vehicle’s perception module which obstacles are considered safety-critical.

Simon Cooksey

Investigating memory consistency models in hardware and in programming models.

Max Zhaoshuo Li

I am a Research Scientist at NVIDIA Research, working on improving AI's understanding of 3D. I received my PhD from Johns Hopkins University and my Bachelor's degree from the University of British Columbia. 

Jaesung Choe

Hi, I am Jaesung Choe. My research lies in 3D computer vision. Please visit my personal website! https://jaesung-choe.github.io/