Spatial Intelligence Lab NVIDIA Research

SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms

1 NVIDIA

* Equal Contribution

Abstract


Rigorous testing of autonomous robots, such as self-driving vehicles, is essential to ensure their safety in real-world deployments. This requires building high-fidelity simulators to test scenarios beyond those that can be safely or exhaustively collected in the real-world. Existing neural rendering methods based on NeRF and 3DGS hold promise but suffer from low rendering speeds or can only render pinhole camera models, hindering their suitability to applications that commonly require high-distortion lenses and LiDAR data. Multi-sensor simulation poses additional challenges as existing methods handle cross-sensor inconsistencies by favoring the quality of one modality at the expense of others. To overcome these limitations, we propose SimULi, the first method capable of rendering arbitrary camera models and LiDAR data in real-time. Our method extends 3DGUT, which natively supports complex camera models, with LiDAR support, via an automated tiling strategy for arbitrary spinning LiDAR models and ray-based culling. To address cross-sensor inconsistencies, we design a factorized 3D Gaussian representation and anchoring strategy that reduces mean camera and depth error by up to 40% compared to existing methods. SimULi renders 10-20x faster than ray tracing approaches and 1.5-10x faster than prior rasterization-based work (and handles a wider range of camera models). When evaluated on two widely benchmarked autonomous driving datasets, SimULi matches or exceeds the fidelity of existing state-of-the-art methods across numerous camera and LiDAR metrics.

Method


We model the scene as a dynamic graph and parameterize the background and each actor with camera and LiDAR 3D Gaussians (left). We render camera views similar to 3DGUT and derive an automated tiling strategy and ray-based culling to efficiently render LiDAR (middle). We sample an image and LiDAR scan at each training step to optimize our representation (right). To improve camera novel view synthesis with LiDAR-supervised geometry, we anchor camera Gaussians near surfaces via nearest-neighbor loss (means of LiDAR gaussians shown below in blue and camera Gaussians in orange).


Representation


We compare our factorized representation and anchoring loss to a unified alternatives directly supervised with LiDAR depth or solely with camera losses. We render novel views on PandaSet below. Our approach outperforms these alternatives noticeably (and renders LiDAR 2x faster).


Capabilities


Camera Distortion: After training, SimULi can render novel views with arbitrary camera models, including fisheye lenses, as shown in our interactive viewer.


Comparisons


We compare our method to SOTA baselines on scenes from the Waymo Open Dataset and PandaSet.

Citation



    @article{turki2025simuli,
        title={SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms},
        author={Turki, Haithem and Wu, Qi and Kang, Xin and Martinez Esturo, Janick and Huang, Shengyu and Li, Ruilong and 
                Gojcic, Zan and de Lutio, Riccardo},
        journal={Preprint},
        year={2025}
    }