| Research

AssertionForge: Enhancing Formal Verification Assertion Generation with Structured Representation of Specifications and RTL

Generating SystemVerilog Assertions (SVAs) from natural language specifications remains a major challenge in formal verification (FV) due to the inherent ambiguity and incompleteness of specifications. Existing LLM-based approaches, such as ASSERTLLM, focus on extracting information solely from specification documents, often failing to capture essential internal signal interactions and design details present in the RTL code, leading to incomplete or incorrect assertions.

Read more about AssertionForge: Enhancing Formal Verification Assertion Generation with Structured Representation of Specifications and RTL

Sanja Fidler

Sanja Fidler is vice president of AI research at NVIDIA, leading the company’s Spatial Intelligence Lab research lab in Toronto. She is also an associate professor at the University of Toronto, and an affiliate faculty member at the Vector Institute, which she co-founded. Previously, she was a research assistant professor at Toyota Technological Institute at Chicago, a philanthropically endowed academic institute located in the University of Chicago campus.

Read more about Sanja Fidler

GRS: Generating robotic simulation tasks from real-world images

Game design hinges on understanding how static rules and content translate into dynamic player behavior---something modern generative systems that inspect only a game's code or assets struggle to capture. We present an automated design iteration framework that closes this gap by pairing a reinforcement learning (RL) agent, which playtests the game, with a large multimodal model (LMM), which revises the game based on what the agent does.

Read more about GRS: Generating robotic simulation tasks from real-world images

Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding

As LLMs scale to multi-million-token KV histories, real-time autoregressive decoding under tight Token-to-Token Latency (TTL) constraints faces growing pressure. Two core bottlenecks dominate: accessing Feed-Forward Network (FFN) weights and reading long KV caches. While Tensor Parallelism (TP) helps mitigate the cost of FFN weight reads, it does not scale well for attention. When TP width exceeds the number of KV heads, it leads to inefficient KV duplication, limits parallelism, and constrains batch size.

Read more about Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding

Bertrand Douillard

Bertrand has focused on AI for robotics since his Ph.D. in the field. He brings experience from engineering and research roles at JPL, Zoox, Toyota Research Institute, and Waymo. His hands-on work has spanned the full range of robotics systems, from classical to end-to-end learned stacks, including perception, planning, controls, and the offline ML pipelines that support them. As part of his transition to NVIDIA Research, his current focus is on end-to-end autonomous vehicle models built on Recurrent State Space Models (RSSMs) and refined with Reinforcement Fine Tuning.

Read more about Bertrand Douillard

Thomas Tian

Read more about Thomas Tian

GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting

3D intelligence leverages rich 3D features and stands as a promising frontier in AI, with 3D rendering fundamental to many downstream applications. 3D Gaussian Splatting (3DGS), an emerging high-quality 3D rendering method, requires significant computation, making real-time execution on existing GPU-equipped edge devices infeasible. Previous efforts to accelerate 3DGS rely on dedicated accelerators that require substantial integration overhead and hardware costs.

Read more about GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting

GEM: GPU-Accelerated Emulator-Inspired RTL Simulation

We present a GPU-accelerated RTL simulator addressing critical challenges in high-speed circuit verification.Traditional CPU-based RTL simulators struggle with scalability and performance, and while FPGA-based emulators offer acceleration, they are costly and less accessible. Previous GPU-based attempts have failed to speed up RTL simulation due to the heterogeneous nature of circuit partitions, which conflicts with the SIMT (Single Instruction, Multiple Thread) paradigm of GPUs.

Read more about GEM: GPU-Accelerated Emulator-Inspired RTL Simulation

Zeyuan Hu

Read more about Zeyuan Hu

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Read more about VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Subscribe to