Home
News
Members
Publications
NVIDIA Research
Light
Dark
Automatic
Fast-SLM: Towards Latency-Optimal Hybrid Small Language Models
Yonggan Fu
,
Xin Dong
,
Shizhe Diao
,
Matthijs Van Keirsbilck
,
Hanrong Ye
,
Wonmin Byeon
,
Yashaswi Karnati
,
Lucas Liebenwein
,
Maksim Khadkevich
,
Alexander Keller
,
Jan Kautz
,
Yingyan Celine Lin
,
Pavlo Molchanov
December 2025
Cite
arXiv
Type
Conference paper
Publication
Advances in Neural Information Processing Systems (NeurIPS)
Xin Dong
Shizhe Diao
Wonmin Byeon
Jan Kautz
Team Leader
Pavlo Molchanov
Related
Hymba: A Hybrid-head Architecture for Small Language Models
CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement
GSPN-2: Efficient Parallel Sequence Modeling
Cite
×