Progressive Learning of 3D Reconstruction Network from 2D GAN Data

Published in arXiv, 2023

Recommended citation: Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro, Progressive Learning of 3D Reconstruction Network from 2D GAN Data.

Progressive3DModel

Abstract

This paper presents a method to reconstruct high-quality textured 3D models from single images. Current methods rely on datasets with expensive annotations; multi-view images and their camera parameters. Our method relies on GAN generated multi-view image datasets which have a negligible annotation cost. However, they are not strictly multi-view consistent and sometimes GANs output distorted images. This results in degraded reconstruction qualities. In this work, to overcome these limitations of generated datasets, we have two main contributions which lead us to achieve state-of-the-art results on challenging objects: 1) A robust multi-stage learning scheme that gradually relies more on the models own predictions when calculating losses, 2) A novel adversarial learning pipeline with online pseudo-ground truth generations to achieve fine details. Our work provides a bridge from 2D supervisions of GAN models to 3D reconstruction models and removes the expensive annotation efforts. We show significant improvements over previous methods whether they were trained on GAN generated multi-view images or on real images with expensive annotations.

Method

Textured3DModel

Overview of the dataset generation (a) and multi-stage training scheme of the reconstruction network (b). The generator network takes input image and outputs mesh and texture predictions. In the first stage, the output is rendered from another view than the input image and losses are calculated on this novel view. This way model is not effected by the missing parts in the images and also the unrealistic segmentation maps resulted from these images. In the second stage, additional reconstruction loss is added from the same view. The rendered and ground-truth images are masked based on the silhouette predictions of the model. Lastly, to achieve sharp and realistic predictions, we add adversarial training on the third stage.

Results

Car

Car

Car

Car

Car

Car

Car

Car

Car

Car

Car

Car

Horse

Horse

Horse

Horse

Bird

Bird

Bird

Bird

Authors

Aysegul Dundar

Jun Gao

Andrew Tao

Bryan Catanzaro