Cosmos-Predict2 — Cosmos Lab

Abstract

Training Physical AI systems in digital environments requires a physical world simulator. Cosmos-Predict2 is the latest version of the Cosmos world model, designed for simulating and predicting the future state of the world as video. Cosmos-Predict2 features four models: Cosmos-Predict2-2B-Text2Image and Cosmos-Predict2-14B-Text2Image for text-to-image generation, and Cosmos-Predict2-2B-Video2World and Cosmos-Predict2-14B-Video2World for video-to-world generation.

Video-to-World Generation

Cosmos-Predict2-14B-Video2World

PBench evaluation on video-to-world generation. Higher Domain Score, Quality Score, and PBench Score are better. ↑ = higher is better.

Model	Domain Score ↑	Quality Score ↑	PBench Score ↑
LTX-Video	74.0	77.2	70.8
HunyuanVideo-I2V	74.0	77.4	70.6
CogVideoX-5B-I2V	74.2	79.5	69.0
Wan2.1-I2V-14B-720P	75.8	81.9	69.7
Cosmos-Predict1-7B-Video2World	73.2	77.4	69.0
Cosmos-Predict1-14B-Video2World	73.3	77.6	69.0
Cosmos-Predict2-2B-Video2World	77.2	84.8	69.6
Cosmos-Predict2-14B-Video2World	77.4	84.9	69.9

Text-to-Image Generation

Cosmos-Predict2-2B / Cosmos-Predict2-14B

GenEval benchmark for text-to-image generation. Higher is better ↑.

Model	Overall ↑	Single Obj. ↑	Two Obj. ↑	Counting ↑	Colors ↑	Position ↑	Color attribution ↑
Stable Diffusion XL	0.55	0.98	0.74	0.39	0.85	0.15	0.23
DALL-E 3	0.67	0.96	0.87	0.47	0.83	0.43	0.45
Flux 1-Dev	0.66	0.98	0.79	0.73	0.77	0.22	0.45
Cosmos-Predict2-2B	0.83	1.00	0.99	0.73	0.89	0.65	0.73
Cosmos-Predict2-14B	0.84	1.00	0.98	0.79	0.90	0.64	0.72

Citation

Please cite as NVIDIA et al. using the following BibTex:

@misc{nvidia2025cosmospredict2,
  title={Cosmos-Predict2: World Simulation Model for Physical AI},
  author={NVIDIA},
  url={https://github.com/nvidia-cosmos/cosmos-predict2},
  year={2025}
}