Projects

Accelerating RL Post-Training with Speculative Decoding in NeMo RL

Published: April 21, 2026

We integrate speculative decoding into NeMo RL with a vLLM backend to accelerate rollout generation while preserving verifier-side training semantics. On 8B reasoning workloads, this yields up to 1.8x faster rollout generation and up to 1.4x faster RL steps, with projected gains of roughly 2.5x at 235B scale.

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Published: March 16, 2026

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. It is the second open-weight LLM, after DeepSeek-V3.2-Speciale-671B-A37B, to achieve Gold Medal-level 🏅 performance in 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals.

NVIDIA Nemotron 3 Super

Published: March 10, 2026

We are releasing NVIDIA Nemotron 3 Super, a 12B active 120B total parameter Mixture-of-Experts hybrid Mamba-Transformer model. Nemotron 3 Super is the first model in the Nemotron 3 series that leverages Latent MoE, includes MTP Layers, and was pre-trained in NVFP4.

Enable NVFP4 Inference for Nemotron with Quantization-Aware Distillation

Published: January 28, 2026

QAD Tech Report

Nemotron-3-Nano-30B-A3B-NVFP4 Model Card

Think Smart About Sparse Compute: LatentMoE for Higher Accuracy per FLOP and per Parameter

Published: January 27, 2026

Read the Paper

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Published: December 15, 2025

We scale up cascaded reinforcement learning (Cascade RL) to develop general purpose reasoning models, Nemotron-Cascade, capable of operating in both instruct and deep thinking modes. Our 14B model can outperform its SFT teacher and achieves silver-medal performance in IOI 2025.

NVIDIA Nemotron 3 Family of Models

Published: December 15, 2025