Toronto AI Lab NVIDIA Research
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

Outdoor Scene Extrapolation with
Hierarchical Generative Cellular Automata

1 Seoul National University
2 NVIDIA
3 University of Toronto
4 Vector Institute
CVPR 2024, Highlight

hGCA extrapolates fine-grained 3D geometry (blue) from real-world sparse LiDAR scans (yellow), captured by autonomous vehicles.

Abstract


Overview of the proposed approach.

We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Generative Cellular Automata (hGCA), a spatially scalable conditional 3D generative model, which grows geometry recursively with local kernels following GCAs, in a coarse-to-fine manner, equipped with a light-weight planner to induce global consistency. Experiments on synthetic scenes show that hGCA generates plausible scene geometry with higher fidelity and completeness compared to state-of-the-art baselines. Our model generalizes strongly from sim-to-real, qualitatively outperforming baselines on the Waymo-open dataset. We also show anecdotal evidence of the ability to create novel objects from real-world geometric cues even when trained on limited synthetic content.

Qualitative Results


Synthetic Scenes

The proposed model, hGCA, can generate realistic large outdoor scenes with high fidelity from accumulated LiDAR scans. Below, we show extrapolation results of synthetic dataset from CARLA (top 3 rows) and Karton City (bottom 3 rows) with models trained from mixture of the two datasets. Our method can complete input scans with high fidelity and extrapolate beyond the field of view, better than the prior methods. Hover over each image to see the zoom-in in full size.

Input
JS3CNet
SCPNet
Ours
Input
SG-NN
GCA
Ours

Real-world Scenes

hGCA generalizes well to real-world LiDAR scans. Below, we demonstrate completion on real-world Waymo-open LiDAR scans, where the model is trained on synthetic datasets shown above. hGCA can generate more complete geometry than accumulated LiDAR scans, which has limited height range and suffers from occlusion. Hover over each image to see the zoom-in in full size.

Input
Acc. Scans
Ours

Our model is spatially scalable, able to complete a 100 meter scene with a single 24GB GPU. hGCA can even extrapolate hills (bottom) from real-world scans. Slide to compare the input (yellow) and the generation (blue).

Input
Completion
 
Input
Completion
 
Input
Completion

Generation Process Visualization

We visualize the 2 stage coarse-to-fine generation process of hGCA.

Karton city

Waymo

Citation



            @InProceedings{Zhang_2024_CVPR,
                author={Zhang, Dongsu and Williams, Francis and Gojcic, Zan and Kreis, Karsten and
                    Fidler, Sanja and Kim, Young Min and Kar, Amlan},
                title={Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata},
                booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
                    Recognition (CVPR)},
                month={June},
                year={2024},
                pages={20145-20154}
            }
        

Acknowledgment


This work was in part supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) [NO.2021-0-01343, Artificial Intelligence Graduate School Program (Seoul National University)] and Creative-Pioneering Researchers Program through Seoul National University.