ChipNeMo: Domain-Adapted LLMs for Chip Design

ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: custom tokenizers, domain-adaptive continued pretraining, supervised fine-tuning (SFT) with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis.

Norm-guided latent space exploration for text-to-image generation

Text-to-image diffusion models show great potential in synthesizing a large variety of concepts in new compositions and scenarios. However, their latent seed space is still not well understood and has been shown to have an impact in generating new and rare concepts. Specifically, simple operations like interpolation and centroid finding work poorly with the standard Euclidean and spherical metrics in the latent space. This paper makes the observation that current training procedures make diffusion models biased toward inputs with a narrow range of norm values.

Syntactic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment

Text-conditioned image generation models often generate incorrect associations between entities and their visual attributes. This reflects an impaired mapping between linguistic binding of entities and modifiers in the prompt and visual binding of the corresponding elements in the generated image. As one notable example, a query like ``a yellow tomato and a red lemon'' may incorrectly produce an image of a yellow lemon and a red tomato.

Online Overexposed Pixels Hallucination in Videos with Adaptive Reference Frame Selection

Low dynamic range (LDR) cameras cannot deal with wide dynamic range inputs, frequently leading to local overexposure issues. We present a learning-based system to reduce these artifacts without resorting to complex acquisition mechanisms like alternating exposures or costly processing that are typical of high dynamic range (HDR) imaging. We propose a transformer-based deep neural network (DNN) to infer the missing HDR details. In an ablation study, we show the importance of using a multiscale DNN and train it with the proper cost function to achieve state-of-the-art quality.

Rethinking Display Requirements for Esports and High Interactivity Applications

Media technology is continuing its transition from passive streaming to participatory interactive experiences, including well-known applications such as web browsing, video conferencing and gaming, as well as emerging and more demanding uses like AR/MR/VR and esports. How should display traits such as latency, refresh rate and size change to meet this trend? We review recent studies from NVIDIA Research and others on requirements for esports as the cutting edge of this trend toward interactivity, and discuss the studies’ implications for other interactive applications.

Verification and Synthesis of Robust Control Barrier Functions: Multilevel Polynomial Optimization and Semidefinite Relaxation

We study the problem of verification and synthesis of robust control barrier functions (CBF) for control-affine polynomial systems with bounded additive uncertainty and convex polynomial constraints on the control. We first formulate robust CBF verification and synthesis as multilevel polynomial optimization problems (POP), where verification optimizes – in three levels – the uncertainty, control, and state, while synthesis additionally optimizes the parameter of a chosen parametric CBF candidate.

Interpretable Trajectory Prediction for Autonomous Vehicles via Counterfactual Responsibility

The ability to anticipate surrounding agents’ behaviors is critical to enable safe and seamless autonomous vehicles (AVs). While phenomenological methods have successfully predicted future trajectories from scene context, these predictions lack interpretability. On the other hand, ontological approaches assume an underlying structure able to describe the interaction dynamics or agents’ internal decision processes. Still, they often suffer from poor scalability or cannot reflect diverse human behaviors.

Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

Recently, reward-conditioned reinforcement learning (RCRL) has gained popularity due to its simplicity, flexibility, and off-policy nature. However, we will show that current RCRL approaches are fundamentally limited and fail to address two critical challenges of RCRL – improving generalization on high reward-to-go (RTG) inputs, and avoiding out-of-distribution (OOD) RTG queries during testing time. To address these challenges when training vanilla RCRL architectures, we propose Bayesian Reparameterized RCRL (BR-RCRL), a novel set of inductive biases for RCRL inspired by Bayes’ theorem.

Refining Obstacle Perception Safety Zones via Maneuver-Based Decomposition

A critical task for developing safe autonomous driving stacks is to determine whether an obstacle is safety-critical, i.e., poses an imminent threat to the autonomous vehicle. Our previous work showed that Hamilton Jacobi reachability theory can be applied to compute interaction-dynamics-aware perception safety zones that better inform an ego vehicle’s perception module which obstacles are considered safety-critical.

Simon Cooksey

Investigating memory consistency models in hardware and in programming models.