Learning to Track Instances without Video Annotations

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches. To resolve these challenges, we introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences. With an instance contrastive objective, we learn an embedding to discriminate each instance from the others.

Weakly-Supervised Physically Unconstrained Gaze Estimation

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios. In contrast, videos of human interactions in unconstrained environments are abundantly available and can be much more easily annotated with frame-level activity labels. In this work, we tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.

Contrastive Syn-to-Real Generalization

Training on synthetic data can be beneficial for label or data-scarce scenarios. However, synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that the diversity of the learned feature embeddings plays an important role in the generalization performance.

Demystifying Data-Driven Probabilistic Medium-Range Weather Forecasting

The recent revolution in data-driven methods for weather forecasting has lead to a fragmented landscape of complex, bespoke architectures and training strategies, obscuring the fundamental drivers of forecast accuracy. Here, we demonstrate that state-of-the-art probabilistic skill requires neither intricate architectural constraints nor specialized training heuristics. We introduce a scalable framework for learning multi-scale atmospheric dynamics by combining a directly downsampled latent space with a history-conditioned local projector that resolves high-resolution physics.

Learning Accurate Storm-Scale Evolution from Observations

Accurate short-term prediction of clouds and precipitation is critical for severe weather warnings,
aviation safety, and renewable energy operations. Forecasts at this timescale are provided by
numerical weather models and extrapolation methods, both of which have limitations. Mesoscale
numerical weather prediction models provide skillful forecasts at these scales but require significant
modeling expertise and computational infrastructure, which limits their accessibility.
Extrapolation-based methods are computationally lightweight but degrade rapidly beyond 1-2

HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts

Machine-learning (ML) weather models now rival leading numerical weather prediction (NWP) systems in medium-range skill. However, almost all still rely on NWP data assimilation (DA) to provide initial conditions, tying them to expensive infrastructure and limiting the practical speed and accuracy gains of ML. More recently, ML-based DA systems have been proposed, which are often trained and evaluated end-to-end with a forecast model, making it difficult to assess the quality of their analysis fields.