NVIDIA Nemotron 3 Ultra

Published:

Models    Ultra Tech Report    Nemotron 3 Blog

Nemotron 3 Ultra Overview

We present our most capable model yet – Nemotron 3 Ultra with 550 billion total and 55 billion active parameters. Nemotron 3 Ultra is the final and best model of the Nemotron 3 family of models.

Key Features

  1. Employs Mixture-of-Experts Hybrid Mamba-Attention architecture.
  2. Leverages LatentMoE for improved accuracy.
  3. Includes MTP layers for faster inference through native speculative decoding.
  4. Supports inference time reasoning budget control.
  5. Pretrained in NVFP4.
  6. Post-trained with enhanced pipeline involving Supervised Fine Tuning (SFT), Reinforcement Learning (RL), and Multi-teacher On-Policy Distillation (MOPD) for improved model accuracy.

Key Highlights

Comparison

  • Nemotron 3 Ultra achieves 5.9x, 4.8x, and 1.6x higher inference throughput compared to GLM-5.1-754B-A40B, Kimi-K2.6-1T-A32B, and Qwen-3.5-397B-17B respectively on the 8k token input / 64k token output setting.
  • Nemotron 3 Ultra achieves on-par accuracies compared to other state-of-the-art open LLMs across a diverse set of benchmarks.
  • Supports context length of up to 1M tokens while outperforming state-of-the-art open LLMs on RULER at 1M context length.

Open Source

We are releasing the pre-trained, post-trained, and quantized checkpoints along with the datasets used for training.

Checkpoints:

Data:

Model Recipes:

Tech Report

More technical details in the Tech Report