Learning to Track Instances without Video Annotations

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches. To resolve these challenges, we introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences. With an instance contrastive objective, we learn an embedding to discriminate each instance from the others.

Weakly-Supervised Physically Unconstrained Gaze Estimation

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios. In contrast, videos of human interactions in unconstrained environments are abundantly available and can be much more easily annotated with frame-level activity labels. In this work, we tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.

Contrastive Syn-to-Real Generalization

Training on synthetic data can be beneficial for label or data-scarce scenarios. However, synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that the diversity of the learned feature embeddings plays an important role in the generalization performance.

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. Delta-rule models subtract the current read before writing a new value, and Kimi Delta Attention (KDA) sharpens forgetting with channel-wise decay.

Monitor refresh rate impacts FPS video gamers' perceptions of display ‘smoothness’ and target acquisition performance

Esports, particularly First-Person Shooter (FPS) games, rely heavily on one's ability to rapidly perceive and respond to visual targets, a skill known as target acquisition. Modern gaming monitors increasingly feature higher refresh rates (up to 360Hz). The current study examined whether FPS gamers can perceptually distinguish between different monitor refresh rates (60Hz, 144Hz, 360Hz) and whether these differences translate to performance improvements in target acquisition tasks. Gamers (N = 101) completed a custom FPS task across three refresh rate conditions.

Adaptive Time Delay for Improving Player Experience and Fairness in First-Person Shooter Games with Network Latency

In a multiplayer networked game, actions for players with higher latencies are received and (potentially) acted upon later than players with lower latencies, leading to unfairness, especially important in competitive games. Time delay is a latency compensation technique that can mitigate this unfairness by adding latency to players with lower latency so that all players experience the same latency. Although this provides equal latency to all players, it unnecessarily degrades the responsiveness for the lower-latency players when the players are not interacting.

Impact of Frametime Spikes on Performance and Quality of Experience in Platformer Games

Frametime spikes can disrupt gameplay in games, affecting both player performance and experience, but the effects of these spikes on navigation based tasks is not well-studied. This work investigates how frametime spikes impact players performing navigation-focused tasks in a platformer game. An open-source platformer game, SuperTux Classic, was modified to deliberately create spikes in frametimes when players performed certain actions, while recording performance and assessing quality of experience (QoE).