NVIDIA Nemotron 3 Super

Published:

Models    Super Tech Report    Nemotron 3 Blog

Nemotron 3 Super Overview

We are releasing NVIDIA Nemotron 3 Super, a 12B active 120B total parameter Mixture-of-Experts hybrid Mamba-Transformer model. Nemotron 3 Super is part of the Nemotron 3 series of models, and is the first model in the series that:

  1. Leverages LatentMoE for improved accuracy.
  2. Includes MTP layers for faster inference through native speculative decoding.
  3. Is Pretrained in NVFP4.

Key Highlights

Comparison

  • Nemotron 3 Super achieves upto 2.2x and 7.5x higher inference throughput than GPT-OSS-120B and Qwen3.5-122B, respectively, on the 8k token input / 16k token output setting.
  • Nemotron 3 Super achieves higher or comparable accuracies to GPT-OSS-120B and Qwen3.5-122B across a diverse set of benchmarks.
  • Supports context length of up to 1M tokens while outperforming both GPT-OSS-120B and Qwen3.5-122B on RULER at 1M context length.

Open Source

We are releasing the pre-trained, post-trained, and quantized checkpoints along with the datasets used for training.

Checkpoints:

Data:

Model Recipes:

Tech Report

More technical details in the Tech Report