Learning to Track Instances without Video Annotations

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches. To resolve these challenges, we introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences. With an instance contrastive objective, we learn an embedding to discriminate each instance from the others.

Weakly-Supervised Physically Unconstrained Gaze Estimation

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios. In contrast, videos of human interactions in unconstrained environments are abundantly available and can be much more easily annotated with frame-level activity labels. In this work, we tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.

Contrastive Syn-to-Real Generalization

Training on synthetic data can be beneficial for label or data-scarce scenarios. However, synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that the diversity of the learned feature embeddings plays an important role in the generalization performance.

Bertrand Douillard

Bertrand has focused on AI for robotics since his Ph.D. in the field. He brings experience from engineering and research roles at JPL, Zoox, Toyota Research Institute, and Waymo. His hands-on work has spanned the full range of robotics systems, from classical to end-to-end learned stacks, including perception, planning, controls, and the offline ML pipelines that support them. As part of his transition to NVIDIA Research, his current focus is on end-to-end autonomous vehicle models built on Recurrent State Space Models (RSSMs) and refined with Reinforcement Fine Tuning.

GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting

3D intelligence leverages rich 3D features and stands as a promising frontier in AI, with 3D rendering fundamental to many downstream applications. 3D Gaussian Splatting (3DGS), an emerging high-quality 3D rendering method, requires significant computation, making real-time execution on existing GPU-equipped edge devices infeasible. Previous efforts to accelerate 3DGS rely on dedicated accelerators that require substantial integration overhead and hardware costs.

GEM: GPU-Accelerated Emulator-Inspired RTL Simulation

We present a GPU-accelerated RTL simulator addressing critical challenges in high-speed circuit verification.Traditional CPU-based RTL simulators struggle with scalability and performance, and while FPGA-based emulators offer acceleration, they are costly and less accessible. Previous GPU-based attempts have failed to speed up RTL simulation due to the heterogeneous nature of circuit partitions, which conflicts with the SIMT (Single Instruction, Multiple Thread) paradigm of GPUs.

Adaptive Algebraic Reuse of Reordering in Cholesky Factorizations with Dynamic Sparsity Patterns

We introduce Parth, a fill-reducing ordering method for sparse Cholesky solvers with dynamic sparsity patterns (e.g., in physics simulations with contact or geometry processing with local remeshing). Parth facilitates the selective reuse of fill-reducing orderings when sparsity patterns exhibit temporal coherence, avoiding full symbolic analysis by localizing the effect of dynamic sparsity changes on the ordering vector.