Long Video Training

Scaling Video Training with Parallelism

Long-video training changes the unit of distributed computation. This blog explains how sequence parallelism scales training when one video sample is too long for one GPU, comparing LongVILA MM-SP and LongLive-2.0 Balanced SP.

Yukang Chen, Luozhou Wang, Wei Huang, Shuai Yang, Weian Mao, Song Han

Jun 3, 2026 1 min read