Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

¹ Seoul National University

² NVIDIA

³ University of Toronto

⁴ Vector Institute

CVPR 2024, Highlight

description Paper description Supp PDF description Video description BibTex description Code

hGCA extrapolates fine-grained 3D geometry (blue) from real-world sparse LiDAR scans (yellow), captured by autonomous vehicles.

Abstract

Overview of the proposed approach.

We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Generative Cellular Automata (hGCA), a spatially scalable conditional 3D generative model, which grows geometry recursively with local kernels following GCAs, in a coarse-to-fine manner, equipped with a light-weight planner to induce global consistency. Experiments on synthetic scenes show that hGCA generates plausible scene geometry with higher fidelity and completeness compared to state-of-the-art baselines. Our model generalizes strongly from sim-to-real, qualitatively outperforming baselines on the Waymo-open dataset. We also show anecdotal evidence of the ability to create novel objects from real-world geometric cues even when trained on limited synthetic content.

Qualitative Results

Synthetic Scenes

The proposed model, hGCA, can generate realistic large outdoor scenes with high fidelity from accumulated LiDAR scans. Below, we show extrapolation results of synthetic dataset from CARLA (top 3 rows) and Karton City (bottom 3 rows) with models trained from mixture of the two datasets. Our method can complete input scans with high fidelity and extrapolate beyond the field of view, better than the prior methods. Hover over each image to see the zoom-in in full size.

Input

JS3CNet

SCPNet

Ours

Input

SG-NN

GCA

Ours

Real-world Scenes

hGCA generalizes well to real-world LiDAR scans. Below, we demonstrate completion on real-world Waymo-open LiDAR scans, where the model is trained on synthetic datasets shown above. hGCA can generate more complete geometry than accumulated LiDAR scans, which has limited height range and suffers from occlusion. Hover over each image to see the zoom-in in full size.

Input

Acc. Scans

Ours

Our model is spatially scalable, able to complete a 100 meter scene with a single 24GB GPU. hGCA can even extrapolate hills (bottom) from real-world scans. Slide to compare the input (yellow) and the generation (blue).

Generation Process Visualization

We visualize the 2 stage coarse-to-fine generation process of hGCA.

Karton city

Waymo

Citation


            @InProceedings{Zhang_2024_CVPR,
                author={Zhang, Dongsu and Williams, Francis and Gojcic, Zan and Kreis, Karsten and
                    Fidler, Sanja and Kim, Young Min and Kar, Amlan},
                title={Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata},
                booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
                    Recognition (CVPR)},
                month={June},
                year={2024},
                pages={20145-20154}
            }

Acknowledgment

This work was in part supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) [NO.2021-0-01343, Artificial Intelligence Graduate School Program (Seoul National University)] and Creative-Pioneering Researchers Program through Seoul National University.

Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

Abstract

Qualitative Results

Synthetic Scenes

Real-world Scenes

Generation Process Visualization

Related works

Citation

Acknowledgment

Outdoor Scene Extrapolation with
Hierarchical Generative Cellular Automata