Dynamic Vision and Learning Group NVIDIA Research
SeMoLi: What Moves Together Belongs Together
SeMoLi Logo

SeMoLi: What Moves Together Belongs Together

1 NVIDIA
2 Technical University of Munich
3 University of Adelaide
CVPR 2024

Abstract


We present Segment Moving in Lidar (SeMoLi) to tackle semi-supervised object detection based on motion cues. Recent results suggest that heuristic-based clustering methods in conjunction with object trackers can be used to pseudo-label instances of moving objects and use these as supervisory signals to train 3D object detectors in Lidar data without manual supervision. We re-think this approach and suggest that both, object detection, as well as motion-inspired pseudo-labeling, can be tackled in a data-driven manner. We leverage recent advances in scene flow estimation to obtain point trajectories from which we extract long-term, class-agnostic motion patterns. Revisiting correlation clustering in the context of message passing networks, we learn to group those motion patterns to cluster points to object instances. By estimating the full extent of the objects, we obtain per-scan 3D bounding boxes that we use to supervise a Lidar object detection network. Our method not only outperforms prior heuristic-based approaches (57.5 AP, +14) improvement over prior work, see visualization below), more importantly, we show we can pseudo-label and train object detectors across datasets.

SeMoLi

DBSCAN++

3D Object Detection: PointPillars detector, trained with SeMoLi (left) pseudo-labels predicts significantly less noisy detections compared to detector trained with the baseline (right) pseudo-labels (DBSCAN++, Najibi et al., ECCV 2023). We visualize bounding boxes with a detection confidence threshold (>=) 0.2.

SeMoLi Pseudo-Label Generation


Pseudo-Label Generation Pipeline

To generate pseudo-labels for LiDAR SeMoLi clusters points based on per-point motion patterns utilizing MPNs. Given the position of all points in the filtered LiDAR point cloud, we generate a kNN-graph over the input point cloud where each point represents a node. This graph represents a first hypothesis of which points could belong together (positive edge) and which not (negative edge). We add node features representing a combination of motion patterns and position to each point as well as proximity-based edge features. SeMoLi performs several message passing steps to classify the initial edge hypothesis into positive and negative. The connected components of our graph represent object point clusters for which we can extract bounding boxes. Since our bounding boxes are modal, we utilize Segmentation Intersection over Union (SegIoU) to evaluate the clustering performance of SeMoLi. For final pseudo-label quality evaluation and PointPillars performance we utilize 3D bounding box Intersection over Union (3DIoU).

Segment Moving in Lidar (SeMoLi) for Pseudo-Labeling: We first preprocess the point cloud to remove static points and predict per-point trajectories (Wang et al., CVPR 2022) on the filtered point cloud (preprocessing and trajectory prediction). Then, we extract velocity-based features from the trajectories and learn to cluster, i.e., segment points based on motion-patters using a Message-Passing Netowrk (Gilmer et al., 2017) in a fully data-driven manner (SeMoLi). From point segments, we extract bounding boxes and inflate them (extracting and inflating bounding boxes). Finally, we apply our approach on unlabeled Lidar streams to obtain pseudo-labels, that we use to train object detectors.

Evaluation of Pseudo-Label Quality: Cross-Class and Cross-Datset Generalzation

Pseudo-Label Visualization on Waymo Open Dataset: We visualize our pseudo-labels on Waymo Open Dataset and compare them visually with pseudo-labels generated by DBSCAN++ (Najibi et al., ECCV 2023). DBSCAN++ is significantly more sensitive to noise that occurs during the pre-processing step (static points removal).

We train SeMoLi in a class-agnostic manner utilizing motion-patterns as well as position, i.e., Gestalt principles that can be applied to any arbitrary object. To evaluate the quality of our pseudo-labels we compute precision and recall in a class class-agnostic manner utilizing 3D bounding box Intersection over Union (3DIoU) as well as point cloud Segmentation Intersection over Union (SegIoU). The latter metric evaluates SeMoLi's ability to cluster points while the prior evaluates the localization error with respect to the ground truth amodal bounding boxes. To determine SeMoLi's per-class performance, we assign pseudo-labels to the class of any potential ground truth bounding box it overlaps with. Since unassigned pseudo-labels are not assigned to any class, we do not get any measure of false positive pseudo-labels per class and, hence, are not able to compute precision. However, investigating the recall metrics, shows that thanks to SeMoLi's class-agnostic training, we perform well over different classes! Furthermore, evaluating SeMoLi trained on Waymo Open Dataset on Argoverse2 dataset, shows that using our gerneralistic approach allows for dataset generalization. Overall, we significantly improve the performance over our heuristic-based baseline, DBSCAN++ (Najibi et al., ECCV 2023)!

Class-wise Evaluation of Pseudo-Labels: For class-wise evaluation, we assign GT classes to pseudo-labels that have any overlap with a GT bounding box. We additionally report the % unmatched false posi- tives (uFP), i.e., pseud-labels not matched to any GT box.

Cross-Dataset Generalization: We evaluate SeMoLi, trained on 90% of Waymo Dataset, on Argoverse2 dataset. We do not train SeMoLi on Argoverse2. We merely pseudo-label it and train a detector using the generated pseudo-labels. We merge Bicycle and Bicyclist as well as Motorcycle and Motorcyclist since they are not distinguishable by motion.

PointPillars trained with SeMoLi pseudo-labels


Less Over- and Under-Segmentation, Higher Cross-Class Generalization

Utilizing SeMoLi's pseudo-labels to train PointPillars to detection objets (O) leads to less over- and under-segmentation as well as more cross-class generalization than utilizing heuristic-based pseudo-label generation approaches. We visually contrast SeMoLi to DBSCAN++ (Najibi et al., ECCV 2023), that tackles similar problem via density-based clustering. We visualize the whole point cloud in purple, and dynamic points, used as input to our method and baseline to localize moving instances, in green. We color-code individual segmented instances. From left to right SeMoLi (i) segments objects even for sparse point clouds and suffers less from under-segmentation, (ii) is able to learn to filter noise from the filtered point cloud, (iii) leads to less over-segmentation, and (iv) generalizes better to different classes. Best seen in color, zoomed.

Towards Learning to Pseudo-Label: Comparison between PointPillars quality to detect objects (O) when trained using pseudo-labels generated by our data-driven approach SeMoLi and our heuristic-based baseline DBSCAN++ (Najibi et al., ECCV 2023).

Cross-Dataset Generalization

We show that utilizing SeMoLi trained on Waymo Open Dataset to generate pseudo-labels on Argoberse2 dataset allows for cross-dataset generaization. Below we visualize the PointPillars detection performance trained on the SeMoLi pseudo-labels on two validation sequences of Argoverse2 dataset.

Cross-Dataset Generalization: We evaluate SeMoLi, trained on 90% labeled Waymo Dataset, on Argoverse2 dataset and visualize bouding boxes with a detection confidence threshold larger than 0.3.

Quantitative Detection Results: Higher Localization Accuracy, Less Noise

The quantitative evaluation of PointPillars trained using SeMoLi's pseudo-labels shows that we achieve a better localization accuracy (higher recall) while predicting less false positive (higher precision) as we show in the table below.

Semi-Supervised 3D Object Detction on Waymo Open Dataset: We evaluate models on all (top) and only moving (bottom) on Waymo Open validation set. % GT indicates the amount of labeled training data, % Pseudo indicates the amount of pseudo-labeled data.


Training SeMoLi on Waymo Open Dataset, applying it on Argoverse2 dataset to generate pseudo-labels, and utilizing those pseudo-labels to train PointPillars shows that we are able to generalize between datasets (see table below).

Cross Dataset Results: We train PP detector on ground truth data as well as on pseudo labels generated with SeMoLi trained on Waymo Open Dataset.

SeMoLi Data Split


To provide a fair comparison, we propose to split the original dataset to have seperate train and evaluation splits for SeMoLi and PointPillars. Additionally, since our pseudo-labels focus on moving objects, we evaluate the final detector performance on moving objects only as well as on static and moving objects. To this end, we utilize the original evaluation sets as test sets and split the original training set into our train and evaluation splits.

Train and Validation Splits: We conduct our experiments using Waymo Open dataset and Argoverse2 datasets set, for which manual labels are available. For training and validation of SeMoLi we utilize the training split of the original datasets. We pre-fix two separate validation sets, one for validating pseudo-labels (val_pseudo), and one for end-model detector performance (val_det). We report performance on varying ratios x for training SeMoLi (train_pseudo) and generating pseudo-labels for training our detector (train_det). To evaluate the performance of PointPillars, we utilize the original evaluation set (test_set).

Citation



    @inproceedings{Seidenschwarz2024semoli:,
        title={SeMoLi: What Moves Together Belongs Together},
        author={Jenny Seidenschwarz and Aljoša Ošep and Francesco Ferroni and Simon Lucey 
        and Laura Leal-Taixé},
        year={2024},
        booktitle={CVPR}
    }

Paper


SeMoLi: What Moves Together Belongs Together

Jenny Seidenschwarz, Aljoša Ošep, Francesco Ferroni, Simon Lucey, Laura Leal-Taixé

description Paper
insert_comment BibTeX