1. [Publications](/publications)
2. Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding
 
 # Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding

  ![](/sites/default/files/publications/demo.gif) 

 We introduce Nemotron-Labs-Diffusion, a tri-mode language model (LM) that unifies AR, diffusion, and self-speculation decoding within a single architecture. Trained with a joint AR-diffusion objective, Nemotron-Labs-Diffusion can switch modes to sustain high throughput across deployment settings and concurrency levels. Our study shows that (1) AR and diffusion objectives are complementary: diffusion improves lookahead planning, while AR provides left-to-right linguistic priors. (2) In self-speculation mode, diffusion drafts while AR verifies, outperforming multi-token prediction (MTP) methods in both acceptance rate and real-device efficiency. (3) A speed-of-light analysis further demonstrates diffusion’s long-term potential, with up to 76.5% more tokens per forward pass than self-speculation under an optimal sampler. Scaling to 3B, 8B, and 14B parameters, our Nemotron-Labs-Diffusion family, including base, instruct, and vision-language models, consistently outperforms state-of-the-art open-source AR and diffusion LMs in both accuracy and speed. For example, Nemotron-Labs-Diffusion-8B decodes 5.9×more tokens per forward than Qwen3-8B with better accuracy, translating to 4× higher throughput on SPEED-Bench with SGLang on a GB200 GPU.

HF collection: <https://huggingface.co/collections/nvidia/nemotron-labs-diffusion>



 ## Authors



[Yonggan Fu](/person/yonggan-fu)

Lexington Whalen (NVIDIA)

Abhinav Garg (NVIDIA)

Chengyue Wu (NVIDIA)

Maksim Khadkevich (NVIDIA)

Nicolai Oswald (NVIDIA)

Enze Xie (NVIDIA)

Daniel Egert (NVIDIA)

Sharath Turuvekere Sreenivas, (NVIDIA)

Shizhe Diao (NVIDIA)

 Chenhan Yu (NVIDIA)

Ye Yu (NVIDIA)

Weijia Chen (NVIDIA)

Sajad Norouzi (NVIDIA)

Jingyu Liu (University of Chicago)

Shiyi Lan (NVIDIA)

Ligeng Zhu (NVIDIA)

Jin Wang (NVIDIA)

Jindong Jiang (NVIDIA)

Morteza Mardani (NVIDIA)

Mehran Maghoumi (NVIDIA)

Song Han (NVIDIA)

Ante Jukić (NVIDIA)

Nima Tajbakhsh (NVIDIA)

Jan Kautz (NVIDIA)

[Pavlo Molchanov](/person/pavlo-molchanov)

 

 

 ## Publication Date



Tuesday, May 19, 2026

 

 ## Research Area



[Artificial Intelligence and Machine Learning ](/research-area/machine-learning-artificial-intelligence)

[Natural Language Processing](/research-area/natural-language-processing)

 

 

 ## External Links



[HF collection](https://huggingface.co/collections/nvidia/nemotron-labs-diffusion)

 

 

 ## Uploaded Files



[Nemotron\_Diffusion\_Tech\_Report.pdf](https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_Diffusion_Tech_Report.pdf?VersionId=1tm4XZATEzGV7cs51XAf.xmWupU20vYW "Open file in new window")3.34 MB

 

 

 ## Copyright



NVIDIA Open Model License