Tasks such as 3D scene segmentation and detection typically require large labeled datasets that capture variations in 3D layouts and poses of objects. In this work, we propose a general framework for the design of neural networks that exhibit invariance to piecewise rigid symmetries in 3D inputs. This suggested design exploits the simple fact that object representation should not change when individual objects undergo rigid transformations, which enables us to improve data efficiency dramatically. Our framework facilitates learning in scenarios with limited data, particularly effective in one-shot settings, even outperforming methods capitalizing on large-scale synthetic datasets.
Paper
Matan Atzmon, Jiahui Huang, Francis Williams, Or Litany
Approximately Piecewise \(E(3)\) Equivariant Point Networks
[Arxiv] [Bibtex] [Video]
Main Idea
In this work we propose a novel framework to extend networks that are equivariant with respect to a single global \(E(3)\) transformation, to accommodate inputs made of multiple parts, each of which exhibits local \(E(3)\) symmetry.
The core challenge that needs to be addressed, is that in practical settings, the partitioning into individually transforming regions is unknown a priori. Errors in the partition prediction would unavoidably map to errors in respecting the true input symmetry.
Past works have proposed different ways to predict the partition, which may exhibit uncontrolled errors in their ability to maintain equivariance to the actual partition. To this end, we introduce
APEN: a general framework for constructing approximate piecewise \(E(3)\) equivariant point networks. Our framework offers an adaptable design to guaranteed bounds on the resulting piecewise \(E(3)\) equivariance approximation errors.
Our primary insight is that functions which are equivariant with respect to a finer partition (compared to the unknown true partition) will also maintain equivariance in relation to the true partition. Leveraging this observation, we propose a compositional design for a partition prediction model. It initiates with a fine partition and incrementally transitions towards a coarser subpartition of the true one, consistently maintaining piecewise equivariance in relation to the current partition.
As a result, the equivariance approximation error can be bounded solely in terms of (i) uncertainty quantification of the partition prediction, and (ii) bounds on the probability of failing to suggest a proper subpartition of the ground truth one.
Key Results
APEN demonstrates a distinct improvement in generalization accuracy compared to previously state-of-the-art 3D neural network models. We considered two data types exemplifying part-based symmetry: (i) human motions, characterized by articulated parts exhibiting rigid movement; and, (ii) real-world scans of room scenes containing multiple furniture-type objects captured at different timestamps. Our method’s efficacy is validated across various training data regimes including one-shot predictions, extreme differences between training and test data distributions, and training with data augmentation.
Human body part segmentation. Qualitative results of generalization to unseen (during training) human actions compared to previous state-of-the-art neural network baseline.
Real-world room scans one-shot segmentation. Qualitative results demonstrate superiority over methods that capitalize on a large-scale synthetic dataset for training.