Learning to Track Instances without Video Annotations

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches. To resolve these challenges, we introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences. With an instance contrastive objective, we learn an embedding to discriminate each instance from the others.

Weakly-Supervised Physically Unconstrained Gaze Estimation

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios. In contrast, videos of human interactions in unconstrained environments are abundantly available and can be much more easily annotated with frame-level activity labels. In this work, we tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.

Contrastive Syn-to-Real Generalization

Training on synthetic data can be beneficial for label or data-scarce scenarios. However, synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that the diversity of the learned feature embeddings plays an important role in the generalization performance.

Sionna Research Kit: A GPU-Accelerated Research Platform for AI-RAN

We introduce the NVIDIA Sionna Research Kit, a GPU-accelerated research platform for developing and testing AI/ML algorithms in 5G NR cellular networks. Powered by the NVIDIA Jetson AGX Orin, the platform leverages accelerated computing to deliver high throughput and real-time signal processing, while offering the flexibility of a software-defined stack. Built on OpenAirInterface (OAI) [1], it unlocks a broad range of research opportunities.

SALAD: Self-Adaptive Link Adaptation

Adapting the modulation and coding scheme (MCS) to the wireless link quality is critical for maximizing spectral efficiency while ensuring reliability. We propose SALAD (selfadaptive link adaptation), an algorithm that exclusively leverages ACK/NACK feedback to reliably track the evolution of the signalto-interference-plus-noise ratio (SINR), achieving high spectral efficiency while keeping the long-term block error rate (BLER) near a desired target. SALAD infers the SINR by minimizing the cross-entropy loss between received ACK/NACKs and predicted BLER values.

Design of a Standard-Compliant Real-Time Neural Receiver for 5G NR

We detail the steps required to deploy a multiuser multiple-input multiple-output (MU-MIMO) neural receiver (NRX) in an actual cellular communication system. This raises several exciting research challenges, including the need for realtime inference and compatibility with the 5G NR standard. As the network configuration in a practical setup can change dynamically within milliseconds, we propose an adaptive NRX architecture capable of supporting dynamic modulation and coding scheme (MCS) configurations without the need for any re-training and without additional inference cost.

Better Together: Leveraging Multiple Digital Twins for Deployment Optimization of Airborne Base Stations

Airborne Base Stations (ABSs) allow for flexible geographical allocation of network resources with dynamically changing load as well as rapid deployment of alternate connectivity solutions during natural disasters. Since the radio infrastructure is carried by unmanned aerial vehicles (UAVs) with limited flight time, it is important to establish the best location for the ABS without exhaustive field trials.