Cosmos-Drive-Dreams:
Scalable Synthetic Driving Data Generation with World Foundation Models
Cosmos-Drive-Dreams is a synthetic data generation (SDG) pipeline designed to produce challenging scenarios that enhance downstream tasks for autonomous vehicles.
Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in training and testing of an AV system. To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generate challenging scenarios to facilitate downstream tasks such as perception and driving policy training. Powering this pipeline is Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos-1 world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving video generation. We showcase the utility of these models by applying Cosmos-Drive-Dreams to scale the quantity and diversity of driving datasets with high-fidelity and challenging scenarios. Experimentally, we demonstrate that our generated data helps in mitigating long-tail distribution problems and enhances generalization in downstream tasks such as 3D lane detection, 3D object detection and driving policy learning. We open source our pipeline toolkit, dataset and model weights through the NVIDIA's Cosmos platform.
Explore Generated Videos by Cosmos-Drive
Diverse Generation with Precise Map Control
Click "HDMap Condition Input" to view the HDMap input. Click the other options to view the corresponding rendered videos.
Note: Condition input MAP can range from simple lane boundaries and traffic signs to more complex HDMap representations.
Multi-View Expansion
The middle video in the first row is the originally generated single-view output. The surrounding videos are generated by expanding this original view into multi-view perspectives.
In-the-wild Video Annotation
Our annotation model automatically predicts HDMap and depth information from in-the-wild driving videos.
Note: We take dashcam videos from the public Nexar dataset to showcase that we can also generate variations for videos in the wild.
LiDAR Generation
Left: Conditional Map Input Visualization
Right: Generated LiDAR point cloud
The generated LiDAR point cloud is rendered by overlaying it onto the RGB input frames for visualization.
Cosmos-Drive-Dreams: Synthetic Data Generation (SDG) Pipeline for Autonomous Vehicle Tasks
Our synthetic dataset enhances performance on various downstream autonomous driving tasks by providing diverse and challenging scenarios.

Overview of our Cosmos-Drive-Dreams pipeline. Starting from either structured labels or in-the-wild video, we generated pixel-aligned HDMap condition video (Step ①). Then we leverages a prompt rewriter to generate diverse prompts and synthesize single-view videos (Step ②). Each single-view video is then expanded into multiple views (Step ③). Finally, a Vision-Language Model (VLM) filter performs rejection sampling to automatically discard low-quality samples, yielding a high-quality, diverse SDG dataset (Step ④).
3D lane detection results with Cosmos-Drive-Dreams on Waymo open dataset and internal RDS-HQ (2k) dataset. Our pipeline significantly improves the 3D lane detection results over baseline. "Cate. Acc." means category accuracy.


Cosmos-Drive-Dreams improves F-score across varying amounts of real-world training data on 3D Lane Detection task. SDG clips are mixed with real clips using ratio of Rs2r = 0.5. Left: Results on testing dataset. Under all weather conditions, SDG consistently improves detection performance across varying amounts of real-world training data, with the most significant gain (+6.0%) observed in the low-data regime (2k clips). Right: Results on extreme weather subset of testing dataset. In more challenging settings (Rainy and Foggy), the benefits of SDG are even more pronounced—showing gains of up to +9.4% under foggy conditions with only 2k real clips. This highlights SDG's effectiveness in enhancing model robustness, particularly under adverse or underrepresented conditions.
3D object detection performance with Cosmos-Drive-Dreams. When applied to augment training set, Cosmos-Drive-Dreams improves the detection performance in general and extreme weather conditions.
LiDAR-based 3D object detection results with Cosmos-Drive-Dreams. Cosmos-Drive-Dreams improves the overall detection performance across different vehicle categories and dataset sizes.



Policy learning results with Cosmos-Drive-Dreams. Left: Given an amount of real world clips, adding SDG data improves trajectory prediction accuracy (minADE on RDS-Bench[Policy], lower is better). Center: Less real-world data is needed to reach a target minADE. Right: Adding a small amount of targeted SDG data can improve performance for certain corner cases (RDS-Bench[VRU/left], without hurting overall driving performance.
Cosmos-Drive-Dreams: Toolkits
Explore our comprehensive toolkits for working with Cosmos-Drive models, including dataset conversion tools, rendering scripts, and sample utilities.
Demonstration of the Cosmos-Drive-Dreams Toolkits on Interactive 3D Trajectory Editing.
Rendering Scripts
Generate HD map visualizations and LiDAR point cloud renderings with configurable camera models and intrinsics.
Sample Utilities
Access tools for prompt modification, trajectory generation, and environment transformation for diverse scenario creation.
WebUI based 3D Trajectory Editing Tool
Web-based interactive interface for editing 3D trajectories with intuitive controls for scenario customization.