Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding

We introduce Nemotron-Labs-Diffusion, a tri-mode language model (LM) that unifies AR, diffusion, and self-speculation decoding within a single architecture. Trained with a joint AR-diffusion objective, Nemotron-Labs-Diffusion can switch modes to sustain high throughput across deployment settings and concurrency levels. Our study shows that (1) AR and diffusion objectives are complementary: diffusion improves lookahead planning, while AR provides left-to-right linguistic priors.

iGRPO: Self-Feedback-Driven LLM Reasoning

Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they still fall short of producing accurate and consistent solutions. Reinforcement Learning (RL) is a framework for aligning these models with task-specific rewards, improving overall quality and reliability. Group Relative Policy Optimization (GRPO) is an efficient, value-function-free alternative to Proximal Policy Optimization (PPO) that leverages group-relative reward normalization.

RLP: Reinforcement as a Pretraining Objective

The dominant paradigm for training large reasoning models starts with pre-training using next-token prediction loss on vast amounts of data. Reinforcement learning, while powerful in scaling reasoning, is introduced only as the very last phase of post-training, preceded by supervised fine-tuning. While dominant, is this an optimal way of training? In this paper, we present RLP, an information-driven reinforcement pretraining objective, that brings the core spirit of reinforcement learning -- exploration -- to the last phase of pretraining.

Learn from Your Mistakes: Self-Correcting Masked Diffusion Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once tokens are unmasked, they remain fixed, leading to error accumulation and ultimately degrading sample quality. We address this by proposing a framework that trains a model to perform both unmasking and correction.

Short-time, Wavelet-inspired Mouse Submovement Detection

Submovements are ballistic components of human motion constituting a large part of motor interaction and arising from the cyclical and overlapping cognitive processes of perception, motor planning, and motor execution. Extracting submovements is challenging as the motions tend to overlap, or start before the previous ends. We propose and evaluate use of a wavelet-inspired technique to accurately locate and parameterize submovements from one-dimensional speed time series.

Tianyi Xie

Tianyi is a Research Scientist at NVIDIA Research. He did his Ph.D. at the University of California, Los Angeles, where he was advised by Chenfanfu Jiang and Demetri Terzopoulos. He received his B.Eng. in Software Engineering from Shanghai Jiao Tong University. His research focuses on bridging physics-based simulation and generative AI, embedding physical priors into generative pipelines to produce visually compelling and physically consistent content, with applications spanning computer graphics, embodied AI, and robotics.