Learning to Track Instances without Video Annotations

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches. To resolve these challenges, we introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences. With an instance contrastive objective, we learn an embedding to discriminate each instance from the others.

Weakly-Supervised Physically Unconstrained Gaze Estimation

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios. In contrast, videos of human interactions in unconstrained environments are abundantly available and can be much more easily annotated with frame-level activity labels. In this work, we tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.

Contrastive Syn-to-Real Generalization

Training on synthetic data can be beneficial for label or data-scarce scenarios. However, synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that the diversity of the learned feature embeddings plays an important role in the generalization performance.

Timing Matters: The Impact of Event-Specific Frametime Spikes in First-Person Shooter Games

Frametime spikes can disrupt gameplay in first-person shooter (FPS) games, affecting both performance and player experience. This paper examines how spikes during specific game events impact players. We developed a custom FPS game that maintains a steady 500 frames/s while inducing frametime spikes during weapon reloading, fast mouse movement, or targeting. Thirty-eight (38) participants played the game in a user study, providing both performance data and user-reported visual smoothness.

Lead Rush: A First-Person Shooter for User Studies and Understanding Effects of Frame Time Spikes

User studies are a cornerstone of human-computer interaction research, including measures of user performance and quality of experience (QoE) – particularly important for games where frame rates and frame timings can impact performance. Unfortunately, commercial games have limited options for customization and do not log player performance data with sufficient detail for use in such studies. This paper introduces Lead Rush, a first-person shooter game designed for conducting user studies on the effects of frame timing and frame rate.

Impact of Graphical Fidelity and Frame-Time Stutter in a First-Person Shooter Game

Frametime spikes and graphical fidelity both matter for the feel of first-person shooter (FPS) games, yet their combined effects are not well understood. This paper examines how graphics settings and frametime spikes during aiming interact with player performance and experience. We developed a custom FPS game with configurable textures, lighting, and visual effects, and induced frametime spikes of 0 ms, 225 ms, or 675 ms during play.

Editing Physiological Signals in Videos Using Latent Representations

Camera-based physiological signal estimation provides a convenient and non-contact way to monitor heart rate, but it also raises serious privacy concerns because facial videos can leak sensitive information about a person’s health and emotional state. We present a learned framework for editing physiological signals in videos while preserving visual fidelity. Our method first encodes an input video into a latent representation using a pretrained 3D Variational Autoencoder, and embeds a target heart-rate prompt through a frozen text encoder.