At NVIDIA, we are developing AI solutions to enable general-purpose humanoid robots to understand the human world, follow language instructions, and perform diverse tasks. A robust Vision-Language-Action (VLA) model is crucial for such advanced capabilities. To this end, we developed GR00T N1, a generalist robot model trained on a diverse dataset that includes egocentric human videos, real and simulated robot trajectories, and synthetic data.