Asset Harvester:
Extracting 3D Assets from Autonomous Driving Logs for Simulation

Tianshi Cao* Jiawei Ren* Yuxuan Zhang Jaewoo Seo Jiahui Huang Shikhar Solanki Haotian Zhang Mingfei Guo Haithem Turki Muxingzi Li Yue Zhu Sipeng Zhang Zan Gojcic Sanja Fidler Kangxue Yin*†

*: Core contributors †: Project lead

NVIDIA

📄 Paper Code 🤗 Model 🤗 Live Demo! 📦 Dataset 📦 Benchmark

Introduction

Asset Harvester enables object-level manipulation in simulation environments such as NuRec.

Closed-loop simulation is a core component of autonomous vehicle (AV) development, enabling scalable testing, training, and safety validation before real-world deployment. Neural scene reconstruction converts driving logs into interactive 3D environments for simulation, but it does not produce complete 3D object assets required for agent manipulation and large-viewpoint novel-view synthesis.

To address this challenge, we present Asset Harvester, an image-to-3D model and end-to-end pipeline that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets.

Rather than relying on a single model component, we developed a system-level design for real-world AV data that combines large-scale curation of object-centric training tuples, geometry-aware preprocessing across heterogeneous sensors, and a robust training recipe that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. Within this system, SparseViewDiT is explicitly designed to address limited-angle views and other real-world data challenges. Together with hybrid data curation, augmentation, and self-distillation, this system enables scalable conversion of sparse AV object observations into reusable 3D assets.

Asset Harvester turns real-world driving logs into complete, simulation-ready 3D assets — from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.

Results

Image-to-3D Results: Vehicles

Image-to-3D Results: Vulnerable Road Users (VRUs)

Test on Out-of-Distribution(OOD) Images

Animating our VRU assets with Kimodo and SOMA

Synthetic Data Generation (SDG) Using Asset Harvester

A pedestrian crossing the street is narrowly missed by an out-of-control vehicle spinning by.

The ego turns left but is blocked by an obstacle, steers to the right, encounters a VRU, stops, and then begins moving left again.

Citation

@article{cao2026assetharvester, title = {Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation}, author = {Cao, Tianshi and Ren, Jiawei and Zhang, Yuxuan and Seo, Jaewoo and Huang, Jiahui and Solanki, Shikhar and Zhang, Haotian and Guo, Mingfei and Turki, Haithem and Li, Muxingzi and Zhu, Yue and Zhang, Sipeng and Gojcic, Zan and Fidler, Sanja and Yin, Kangxue}, year = {2026}, }