NVIDIA Nemotron 3 Family of Models
Published:

Models Nemotron 3 White Paper Nano Tech Report
We announce NVIDIA Nemotron 3, the most efficient family of open models with leading accuracy for agentic AI applications. The Nemotron 3 family consists of three models: Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities.
Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance.
We are releasing the Nemotron 3 Nano model and technical report. Super and Ultra releases will follow in the coming months.
Nemotron 3 technologies
- Hybrid MoE: Nemotron 3 family of models utilize a hybrid Mamba-Transformer MoE architecture to provide best-in-class throughput while having better or on-par accuracy than standard Transformers.
- LatentMoE: Super and Ultra utilize Latent MoE, a novel hardware-aware expert design for improved accuracy.
- Multi-Token Prediction: Super and Ultra incorporate MTP layers for improved long-form text generation efficiency and better model quality.
- NVFP4: Super and Ultra are trained with NVFP4.
- Long Context: Nemotron 3 models support context length up to 1M tokens.
- Multi-environment Reinforcement Learning Post-training: Nemotron 3 models are trained using a diverse set of RL environments helping models achieve superior accuracy across a broad range of tasks.
- Granular Reasoning Budget Control at Inference Time: Nemotron 3 models are trained to work with inference-time budget control.
Nemotron 3 Nano

Nemotron 3 Nano is a 3.2B active (3.6B with embeddings), 31.6B total parameter model. It achieves better accuracy than our previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass.
Key highlights:
- More accurate than GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507 on popular benchmarks spanning different categories.
- On the 8K input / 16K output setting with a single H200, Nemotron 3 Nano provides inference throughput that is 3.3x higher than Qwen3-30B-A3B and 2.2x higher than GPT-OSS-20B.
- Supports context length up to 1M tokens while outperforming both GPT-OSS-20B and Qwen3-30B-A3B-Instruct-2507 on RULER across different context lengths.
- We are releasing the model weights, training recipe, and all the data for which we hold redistribution rights.
Open Source
Along with the Nemotron 3 white paper and the Nano 3 technical report, we are releasing the following:
Checkpoints:
- Nemotron 3 Nano 30B-A3B FP8: the final post-trained and FP8 quantized Nano model
- Nemotron 3 Nano 30B-A3B BF16: the post-trained Nano model
- Nemotron 3 Nano 30B-A3B Base BF16: the pre-trained base Nano model
- Qwen-3-Nemotron-235B-A22B-GenRM: the GenRM used for RLHF
Data:
- Nemotron-CC-v2.1: 2.5 trillion new English tokens from Common Crawl, including curated data from 3 recent snapshots, synthetic rephrasing, and translation to English from other languages.
- Nemotron-CC-Code-v1: A pretraining dataset consisting of 428 billion high-quality code tokens obtained from processing Common Crawl Code pages using the Lynx + LLM pipeline from Nemotron-CC-Math-v1. Preserves equations and code, standardizes math equations to LaTeX, and removes noise.
- Nemotron-Pretraining-Code-v2: Refresh of curated GitHub code references with multi-stage filtering, deduplication, and quality filters. Large-scale synthetic code data.
- Nemotron-Pretraining-Specialized-v1: Collection of synthetic datasets for specialized areas like STEM reasoning and scientific coding.
- Nemotron-SFT-Data: Collection of new Nemotron 3 Nano SFT datasets.
- Nemotron-RL-Data: Collection of new Nemotron 3 Nano RL datasets.
Model Recipes:
For more details, please refer to the following:
- Nemotron 3 Blogs
- Nemotron 3 white paper: NVIDIA Nemotron 3: Efficient and Open Intelligence
- Nemotron 3 Nano technical report: Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning