Small Language Models are the Future of Agentic AI

NVIDIA Research
Minifinetuning diagram

An illustration of agentic systems with different modes of agency. Left: Language model agency. The language model acts both as the HCI and the orchestrator of tool calls to carry out a task. Right: Code agency. The language model fills the role of the HCI (optionally) while a dedicated controller code orchestrates all interactions

Abstract

Large language models (LLMs) are often praised for exhibiting near-human performance on a wide range of tasks and valued for their ability to hold a general conversation. The rise of agentic AI systems is, however, ushering in a mass of applications in which language models perform a small number of specialized tasks repetitively and with little variation.

Here we lay out the position that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, and are therefore the future of agentic AI. Our argumentation is grounded in the current level of capabilities exhibited by SLMs, the common architectures of agentic systems, and the economy of LM deployment. We further argue that in situations where general-purpose conversational abilities are essential, heterogeneous agentic systems (i.e., agents invoking multiple different models) are the natural choice. We discuss the potential barriers for the adoption of SLMs in agentic systems and outline a general LLM-to-SLM agent conversion algorithm.

Our position, formulated as a value statement, highlights the significance of the operational and economic impact even a partial shift from LLMs to SLMs is to have on the AI agent industry. We aim to stimulate the discussion on the effective use of AI resources and hope to advance the efforts to lower the costs of AI of the present day. Calling for both contributions to and critique of our position, we commit to publishing all such correspondence on this website.

Recommendations

While our paper presents a value statement advocating for the use of small language models (SLMs) in agentic AI systems, we also provide practical recommendations for organizations and developers looking to implement this approach.

  • Prioritize SLMs for Cost-Effective Deployment. Organizations should consider adopting small language models for agentic applications to reduce latency, energy consumption, and infrastructure costs, particularly in scenarios where real-time or on-device inference is required.
  • Design Modular Agentic Systems. Developers are encouraged to structure agentic systems using a heterogeneous model approach—leveraging SLMs for routine, narrow tasks and reserving LLMs for more complex reasoning—thereby improving efficiency and maintainability.
  • Leverage SLMs for Rapid Specialization. Teams should take advantage of the agility of SLMs by fine-tuning them for specific tasks, enabling faster iteration cycles and easier adaptation to evolving use cases and requirements.

Correspondence

We welcome correspondence on the contents of this paper, including critiques, suggestions, and contributions that can help refine our position and advance the discussion on the effective use of AI resources. If you have feedback or would like to contribute to the discussion, please reach out to us via email at agents@nvidia.com. Correspondence, where the corresponding author has given permission, will be published on this website to foster an open dialogue within the research community and beyond.

BibTeX

@misc{belcak2025small,
  title        = {Small Language Models are the Future of Agentic AI},
  author       = {Belcak, Peter and Heinrich, Greg and Diao, Shizhe and Fu, Yonggan and Dong, Xin and Muralidharan, Saurav and Lin, Yingyan Celine and Molchanov, Pavlo},
  year         = {2025},
  eprint       = {2506.02153},
  archivePrefix= {arXiv},
  primaryClass = {cs.AI},
  url          = {https://arxiv.org/abs/2506.02153},
  doi          = {10.48550/arXiv.2506.02153}
}