Projects

Accelerating RL Post-Training with Speculative Decoding in NeMo RL

Published:

We integrate speculative decoding into NeMo RL with a vLLM backend to accelerate rollout generation while preserving verifier-side training semantics. On 8B reasoning workloads, this yields up to 1.8x faster rollout generation and up to 1.4x faster RL steps, with projected gains of roughly 2.5x at 235B scale.

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Published:

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. It is the second open-weight LLM, after DeepSeek-V3.2-Speciale-671B-A37B, to achieve Gold Medal-level 🏅 performance in 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals.

NVIDIA Nemotron 3 Super

Published:

We are releasing NVIDIA Nemotron 3 Super, a 12B active 120B total parameter Mixture-of-Experts hybrid Mamba-Transformer model. Nemotron 3 Super is the first model in the Nemotron 3 series that leverages Latent MoE, includes MTP Layers, and was pre-trained in NVFP4.