Learning to Track Instances without Video Annotations

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches. To resolve these challenges, we introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences. With an instance contrastive objective, we learn an embedding to discriminate each instance from the others.

Weakly-Supervised Physically Unconstrained Gaze Estimation

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios. In contrast, videos of human interactions in unconstrained environments are abundantly available and can be much more easily annotated with frame-level activity labels. In this work, we tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.

Contrastive Syn-to-Real Generalization

Training on synthetic data can be beneficial for label or data-scarce scenarios. However, synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that the diversity of the learned feature embeddings plays an important role in the generalization performance.

Liron Gantz

Dr. Liron Gantz has led Nvidia's Electro-Optics group (NVEO) for the past six years, driving advancements in silicon photonics for Co-Packaged Optics products. He completed his Ph.D. in 2017. In 2016 he joined Mellanox, where he established the electro-optics lab for device characterization. He quickly transitioned into leadership roles, specializing in the design, characterization, and modeling of silicon photonic technologies. Dr. Gantz and his team have been instrumental in the Sagitta project, from its inception to chip development and Taipan system bring-up.

Learning to Compare Hardware Designs for High-Level Synthesis

High-level synthesis (HLS) is an automated design process that transforms high-level code into optimized hardware designs, enabling the rapid development of efficient hardware accelerators for various applications such as image processing, machine learning, and signal processing. To achieve optimal performance, HLS tools rely on pragmas, which are directives inserted into the source code to guide the synthesis process, and these pragmas can have various settings and values that significantly impact the resulting hardware design.

AssertionForge: Enhancing Formal Verification Assertion Generation with Structured Representation of Specifications and RTL

Generating SystemVerilog Assertions (SVAs) from natural language specifications remains a major challenge in formal verification (FV) due to the inherent ambiguity and incompleteness of specifications. Existing LLM-based approaches, such as ASSERTLLM, focus on extracting information solely from specification documents, often failing to capture essential internal signal interactions and design details present in the RTL code, leading to incomplete or incorrect assertions.

Sanja Fidler

Sanja Fidler is vice president of AI research at NVIDIA, leading the company’s Spatial Intelligence Lab research lab in Toronto. She is also an associate professor at the University of Toronto, and an affiliate faculty member at the Vector Institute, which she co-founded. Previously, she was a research assistant professor at Toyota Technological Institute at Chicago, a philanthropically endowed academic institute located in the University of Chicago campus. 

GRS: Generating robotic simulation tasks from real-world images

Game design hinges on understanding how static rules and content translate into dynamic player behavior---something modern generative systems that inspect only a game's code or assets struggle to capture. We present an automated design iteration framework that closes this gap by pairing a reinforcement learning (RL) agent, which playtests the game, with a large multimodal model (LMM), which revises the game based on what the agent does.

Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding

As LLMs scale to multi-million-token KV histories, real-time  autoregressive decoding under tight Token-to-Token Latency (TTL) constraints faces growing pressure. Two core bottlenecks dominate: accessing Feed-Forward Network (FFN) weights and reading long KV caches. While Tensor Parallelism (TP) helps mitigate the cost of FFN weight reads, it does not scale well for attention. When TP width exceeds the number of KV heads, it leads to inefficient KV duplication, limits parallelism, and constrains batch size.