Minitron-SSM: Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

Hybrid LLM architectures that combine Attention and State Space Models (SSMs) achieve state-of-the-art accuracy and runtime performance. Recent work has demonstrated that applying compression and distillation to Attention-only models yields smaller, more accurate models at a fraction of the training cost. In this work, we explore the effectiveness of compressing Hybrid architectures. We introduce a novel group-aware pruning strategy that preserves the structural integrity of SSM blocks and their sequence modeling capabilities.

Shengze Wang

Shengze Wang joined NVIDIA Research in 2025. He works at the intersection of Computer Vision, Graphics, and AI with a focus on modeling human geometry and behaviors for lifelike robots. His past research spans 3D reconstruction and rendering, human pose estimation, generative models, SLAM, and telepresence systems. He obtained his Ph.D. from the University of North Carolina at Chapel Hill advised by Dr. Henry Fuchs, Master of Science in Computer Vision from Carnegie Mellon University, and Bachelor of Science in Computer Engineering from University of Illinois at Urbana Champaign.

Beyond Behavior Cloning in Autonomous Driving: a Survey of Closed-Loop Training Techniques

Behavior cloning, the dominant approach for training autonomous vehicle (AV) policies, suffers from a fundamental gap: policies trained open-loop on temporally independent samples must operate in closed-loop where actions influence future observations. This mismatch can cause covariate shift, compounding errors, and poor interactive behavior, among other issues. Closed-loop training mitigates the problem by exposing policies to the consequences of their actions during training.