Scaling Video Training with Parallelism
We published a new blog post: Scaling Video Training with Parallelism.
Long-video training changes the unit of distributed computation: instead of only splitting across samples, the system must split inside one long video sample. The post discusses sequence parallelism for long-video understanding and generation, including LongVILA MM-SP and LongLive-2.0 Balanced SP.