The transformative capabilities of language models (LMs) have intensified the demand for their deployment on everyday devices, necessitating efficient processing for on-device language tasks. To address this, we propose Hymba, a new family of small language models featuring a hybrid-head architecture that strategically integrates attention mechanisms with state space models (SSMs). This architecture leverages the strengths of both systems: attention heads provide high-resolution recall, akin to snapshot memories in the human brain, while SSM heads offer efficient context summarization, similar to fading memories. To further enhance Hymba’s performance, we introduce learnable meta tokens that are prepended to input sequences and jointly trained with model weights during pretraining. These meta tokens act as a learned cache initialization during inference, modulating all subsequent tokens within the hybrid heads and boosting the model’s focus on salient information, similar to metamemory. Extensive experiments and ablation studies demonstrate that Hymba sets new state-of-the-art results for small LMs across various benchmarks and advances the accuracy-efficiency trade-offs of small LMs. For instance, Hymba-1.5B achieves comparable commonsense reasoning accuracy to LLaMA 3.2 3B while being 3.49x faster and offering a 14.72x reduction in cache size.