Learning to Track Instances without Video Annotations

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches. To resolve these challenges, we introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences. With an instance contrastive objective, we learn an embedding to discriminate each instance from the others.

Weakly-Supervised Physically Unconstrained Gaze Estimation

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios. In contrast, videos of human interactions in unconstrained environments are abundantly available and can be much more easily annotated with frame-level activity labels. In this work, we tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.

Contrastive Syn-to-Real Generalization

Training on synthetic data can be beneficial for label or data-scarce scenarios. However, synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that the diversity of the learned feature embeddings plays an important role in the generalization performance.

Detection of artifacts in clean and corrupted video pairs is influenced by artifact type and presentation modality

Modern computer-generated videos display a variety of artifacts. While image-computable metrics exist to quantify the visibility of artifacts in images and videos, designers often rely in part on human observers to find artifacts and assess video quality. Furthermore, human labeling of artifacts is often an essential component of building image and video quality metrics. Yet, relatively little research has studied the impact of different video comparison interfaces on an observer’s strategies and ability to detect different artifact types.

A Generative AI Game Jam Case Study from October 2024

Generative Artificial Intelligence (GenAI) promises to democratize many creative endeavors, from art, to music, to writing. However, video games are an underexplored field for GenAI given the highly multi-modal and interactive nature. In this work, we present a case study game-jam-style game development process (performed over only a few days!) making heavy use of available GenAI tools (as of October 2024) to create a game called Plunderwater: Sunken Treasure, a title selected from among GenAI suggestions.

Fly, Fail, Fix: Iterative Game Repair with Reinforcement Learning and Large Multimodal Models

Game design hinges on understanding how static rules and content translate into dynamic player behavior---something modern generative systems that inspect only a game's code or assets struggle to capture. We present an automated design iteration framework that closes this gap by pairing a reinforcement learning (RL) agent, which playtests the game, with a large multimodal model (LMM), which revises the game based on what the agent does. In each loop the RL player completes several episodes, producing
(i)~numerical play metrics and/or