We introduce DMTet, a deep 3D conditional generative model that can synthesize high-resolution 3D shapes using simple user guides such as coarse voxels. It marries the merits of implicit and explicit 3D representations by leveraging a novel hybrid 3D representation. Compared to the current implicit approaches, which are trained to regress the signed distance values, DMTet directly optimizes for the reconstructed surface, which enables us to synthesize finer geometric details with fewer artifacts. Unlike deep 3D generative models that directly generate explicit representations such as meshes, our model can synthesize shapes with arbitrary topology. The core of DMTet includes a deformable tetrahedral grid that encodes a discretized signed distance function and a differentiable marching tetrahedra layer that converts the implicit signed distance representation to the explicit surface mesh representation. This combination allows joint optimization of the surface geometry and topology as well as generation of the hierarchy of subdivisions using reconstruction and adversarial losses defined explicitly on the surface mesh. Our approach significantly outperforms existing work on conditional shape synthesis from coarse voxel inputs, trained on a dataset of complex 3D animal shapes. Project page: https://nv-tlabs.github.io/DMTet/.
Qualitative results on 3D Shape Synthesis from Coarse Voxels. Comparing with all baselines, our method (fifth column) reconstructs shapes with much higher quality. Adding GAN (highlighted in orange) further improves the realism of the generated shape. We also show the retrieved shapes from the training set in the second last column.
Qualitative results on 3D Reconstruction from Point Clouds: Our model reconstructs shapes with more geometric details compared to baselines using different representations - voxels, deforming a mesh with a fixed template, deforming a mesh generated from a volumetric representation, tetrahedral mesh, and implicit functions.
Quantitative Results on Point Cloud Reconstruction (Chamfer L1). Our model outperforms ConvOnet, the SOTA implicit approach, across all object categories even running at a low grid resolution (in yellow). Our model run significantly faster at inference time and produces an explicit mesh as output, making it suitable for interactive graphic application.
With volume and surface subdivision (in blue) we can further boost the reconstruction quality. In this case, we apply volume subdivision once which double the grid resolution. The runtime is only doubled instead of growing cubically as resolution increases.
We demonstrate the effect of learning on surface by comparing with the oracle performance of MT/MC evaluated on ShapeNet Chairs. Without deforming the grid, DMTET outperforms the oracle performance of MT by a large margin when querying the same number of points, although DMTET predicts the surface from noisy point cloud. This demonstrates that directly optimizing the reconstructed surface can mitigate the discretization errors imposed by MT to a large extent.
@inproceedings{shen2021dmtet,
title = {Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis},
author = {Tianchang Shen and Jun Gao and Kangxue Yin and Ming-Yu Liu and Sanja Fidler},
year = {2021},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}
}
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis
Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, Sanja Fidler