NVIDIA Nemotron 3 Family of Models

Published:

Models    Nemotron 3 White Paper    Nano Tech Report

We announce NVIDIA Nemotron 3, the most efficient family of open models with leading accuracy for agentic AI applications. The Nemotron 3 family consists of three models: Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities.

Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance.

We are releasing the Nemotron 3 Nano model and technical report. Super and Ultra releases will follow in the coming months.

Nemotron 3 technologies

  • Hybrid MoE: Nemotron 3 family of models utilize a hybrid Mamba-Transformer MoE architecture to provide best-in-class throughput while having better or on-par accuracy than standard Transformers.
  • LatentMoE: Super and Ultra utilize Latent MoE, a novel hardware-aware expert design for improved accuracy.
  • Multi-Token Prediction: Super and Ultra incorporate MTP layers for improved long-form text generation efficiency and better model quality.
  • NVFP4: Super and Ultra are trained with NVFP4.
  • Long Context: Nemotron 3 models support context length up to 1M tokens.
  • Multi-environment Reinforcement Learning Post-training: Nemotron 3 models are trained using a diverse set of RL environments helping models achieve superior accuracy across a broad range of tasks.
  • Granular Reasoning Budget Control at Inference Time: Nemotron 3 models are trained to work with inference-time budget control.

Nemotron 3 Nano

Nemotron 3 Nano is a 3.2B active (3.6B with embeddings), 31.6B total parameter model. It achieves better accuracy than our previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass.

Key highlights:

  • More accurate than GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507 on popular benchmarks spanning different categories.
  • On the 8K input / 16K output setting with a single H200, Nemotron 3 Nano provides inference throughput that is 3.3x higher than Qwen3-30B-A3B and 2.2x higher than GPT-OSS-20B.
  • Supports context length up to 1M tokens while outperforming both GPT-OSS-20B and Qwen3-30B-A3B-Instruct-2507 on RULER across different context lengths.
  • We are releasing the model weights, training recipe, and all the data for which we hold redistribution rights.

Open Source

Along with the Nemotron 3 white paper and the Nano 3 technical report, we are releasing the following:

Checkpoints:

Data:

  • Nemotron-CC-v2.1: 2.5 trillion new English tokens from Common Crawl, including curated data from 3 recent snapshots, synthetic rephrasing, and translation to English from other languages.
  • Nemotron-CC-Code-v1: A pretraining dataset consisting of 428 billion high-quality code tokens obtained from processing Common Crawl Code pages using the Lynx + LLM pipeline from Nemotron-CC-Math-v1. Preserves equations and code, standardizes math equations to LaTeX, and removes noise.
  • Nemotron-Pretraining-Code-v2: Refresh of curated GitHub code references with multi-stage filtering, deduplication, and quality filters. Large-scale synthetic code data.
  • Nemotron-Pretraining-Specialized-v1: Collection of synthetic datasets for specialized areas like STEM reasoning and scientific coding.
  • Nemotron-SFT-Data: Collection of new Nemotron 3 Nano SFT datasets.
  • Nemotron-RL-Data: Collection of new Nemotron 3 Nano RL datasets.

Model Recipes:

For more details, please refer to the following: