We present Segment Moving in Lidar (SeMoLi) to tackle semi-supervised object detection based on motion cues. Recent results suggest that heuristic-based clustering methods in conjunction with object trackers can be used to pseudo-label instances of moving objects and use these as supervisory signals to train 3D object detectors in Lidar data without manual supervision. We re-think this approach and suggest that both, object detection, as well as motion-inspired pseudo-labeling, can be tackled in a data-driven manner. We leverage recent advances in scene flow estimation to obtain point trajectories from which we extract long-term, class-agnostic motion patterns. Revisiting correlation clustering in the context of message passing networks, we learn to group those motion patterns to cluster points to object instances. By estimating the full extent of the objects, we obtain per-scan 3D bounding boxes that we use to supervise a Lidar object detection network. Our method not only outperforms prior heuristic-based approaches (57.5 AP, +14) improvement over prior work, see visualization below), more importantly, we show we can pseudo-label and train object detectors across datasets.
To generate pseudo-labels for LiDAR SeMoLi clusters points based on per-point motion patterns utilizing MPNs. Given the position of all points in the filtered LiDAR point cloud, we generate a kNN-graph over the input point cloud where each point represents a node. This graph represents a first hypothesis of which points could belong together (positive edge) and which not (negative edge). We add node features representing a combination of motion patterns and position to each point as well as proximity-based edge features. SeMoLi performs several message passing steps to classify the initial edge hypothesis into positive and negative. The connected components of our graph represent object point clusters for which we can extract bounding boxes. Since our bounding boxes are modal, we utilize Segmentation Intersection over Union (SegIoU) to evaluate the clustering performance of SeMoLi. For final pseudo-label quality evaluation and PointPillars performance we utilize 3D bounding box Intersection over Union (3DIoU).
We train SeMoLi in a class-agnostic manner utilizing motion-patterns as well as position, i.e., Gestalt principles that can be applied to any arbitrary object. To evaluate the quality of our pseudo-labels we compute precision and recall in a class class-agnostic manner utilizing 3D bounding box Intersection over Union (3DIoU) as well as point cloud Segmentation Intersection over Union (SegIoU). The latter metric evaluates SeMoLi's ability to cluster points while the prior evaluates the localization error with respect to the ground truth amodal bounding boxes. To determine SeMoLi's per-class performance, we assign pseudo-labels to the class of any potential ground truth bounding box it overlaps with. Since unassigned pseudo-labels are not assigned to any class, we do not get any measure of false positive pseudo-labels per class and, hence, are not able to compute precision. However, investigating the recall metrics, shows that thanks to SeMoLi's class-agnostic training, we perform well over different classes! Furthermore, evaluating SeMoLi trained on Waymo Open Dataset on Argoverse2 dataset, shows that using our gerneralistic approach allows for dataset generalization. Overall, we significantly improve the performance over our heuristic-based baseline, DBSCAN++ (Najibi et al., ECCV 2023)!
Utilizing SeMoLi's pseudo-labels to train PointPillars to detection objets (O) leads to less over- and under-segmentation as well as more cross-class generalization than utilizing heuristic-based pseudo-label generation approaches. We visually contrast SeMoLi to DBSCAN++ (Najibi et al., ECCV 2023), that tackles similar problem via density-based clustering. We visualize the whole point cloud in purple, and dynamic points, used as input to our method and baseline to localize moving instances, in green. We color-code individual segmented instances. From left to right SeMoLi (i) segments objects even for sparse point clouds and suffers less from under-segmentation, (ii) is able to learn to filter noise from the filtered point cloud, (iii) leads to less over-segmentation, and (iv) generalizes better to different classes. Best seen in color, zoomed.
We show that utilizing SeMoLi trained on Waymo Open Dataset to generate pseudo-labels on Argoberse2 dataset allows for cross-dataset generaization. Below we visualize the PointPillars detection performance trained on the SeMoLi pseudo-labels on two validation sequences of Argoverse2 dataset.
The quantitative evaluation of PointPillars trained using SeMoLi's pseudo-labels shows that we achieve a better localization accuracy (higher recall) while predicting less false positive (higher precision) as we show in the table below.
To provide a fair comparison, we propose to split the original dataset to have seperate train and evaluation splits for SeMoLi and PointPillars. Additionally, since our pseudo-labels focus on moving objects, we evaluate the final detector performance on moving objects only as well as on static and moving objects. To this end, we utilize the original evaluation sets as test sets and split the original training set into our train and evaluation splits.
@inproceedings{Seidenschwarz2024semoli:,
title={SeMoLi: What Moves Together Belongs Together},
author={Jenny Seidenschwarz and Aljoša Ošep and Francesco Ferroni and Simon Lucey
and Laura Leal-Taixé},
year={2024},
booktitle={CVPR}
}
|