Accelerating RL Post-Training with Speculative Decoding in NeMo RL
Published:
We integrate speculative decoding into NeMo RL with a vLLM backend to accelerate rollout generation while preserving verifier-side training semantics. On 8B reasoning workloads, this yields up to 1.8x faster rollout generation and up to 1.4x faster RL steps, with projected gains of roughly 2.5x at 235B scale.