Semantic Segmentation with Generative Models (semanticGAN): Semi-Supervised Learning and Strong Out-of-Domain Generalization

Daiqing Li¹

Junlin Yang^1,3

Karsten Kreis¹

Antonio Torralba⁵

Sanja Fidler^1,2,4

¹NVIDIA

²University of Toronto

³Yale University

⁴Vector Institute

⁵MIT

CVPR2021

Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts. This is the goal of semisupervised learning, which exploits more widely available unlabeled data to complement small labeled data sets. In this paper, we propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. Concretely, we learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images supplemented with only few labeled ones. We build our architecture on top of StyleGAN2, augmented with a label synthesis branch. Image labeling at test time is achieved by first embedding the target image into the joint latent space via an encoder network and test-time optimization, and then generating the label from the inferred embedding. We evaluate our approach in two important domains: medical image segmentation and part-based face segmentation. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization, such as transferring from CT to MRI in medical imaging, and photographs of real faces to paintings, sculptures, and even cartoons and animal faces.

News

[Jun 2021] Code release [link]!
[Jun 2021] Upload CVPR presentation video & poster
[Apr 2021] Project page is up

Paper

Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, Sanja Fidler

Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

CVPR, 2021

[Paper]

[Supplement]

[Poster]

[Bibtex]

CVPR Presentation

Videos

Face Parts Segmentation
(Model trained on real human-face dataset CelebA only)


Interpolation Between Celeba		Interpolation Between Celeba and Out-of-domain Metface

Interpolation Between Celeba and Out-of-domain Cariface		Interpolation Between Celeba and Extreme Out-of-domain data

Chest X-ray Segmentation


Interpolation Between Random Chest X-ray Latent Codes		Test-time Optimization Steps Visualization on Out-of-domain Data

CT-MRI Liver Segmentation
(Model trained on CT dataset only)


Interpolation Between Random CT Latent Codes		Interpolation Between Liver CT and CHAOS Liver T1-MRI

Face Parts Segmentation Optimization Steps

Image reconstructions and segmentation label predictions at different steps during the optimization process. Step 0 corresponds to using the latent code predicted by the encoder without any further optimization. The model was trained on CelebA-Mask data. Hence, the first example corresponds to in-domain data, while the other two examples, from the MetFace dataset, are out-of-domain cases.

Latent Space Interpolations

Linear interpolations between two random latent codes from celeba images to cother eleba, metface and cartoon images. We show both the interpolated images and their semantic segmentation labels. The results show that the generative model learnt a smooth latent space with meaningful images along the interpolation path. Furthermore, we observe consistency between images and predicted labels along the interpolation path.

Quantitative Results

Chest X-ray Lung Segmentation Numbers are DICE scores. JSRT is the in-domain dataset, on which we both train and evaluate. We also evaluate on additional out-of-domain datasets (NLM, NIH, SZ). Ours as well as the other semi-supervised methods use additional 108k unlabeled data samples.

Skin Lesion Segmentation Numbers are JC index. ISIC is the in-domain dataset, on which we both train and evaluate. We also evaluate on additional out-of-domain datasets (PH2, IS, Quest). Ours as well as the other semi-supervised methods use additional 33k unlabeled data samples.

CT-MRI Transfer Liver Segmentation Numbers are DICE per patient. CT is the in-domain data set on which we both train and evaluate. We also evaluate on additional unseen MRI T1-in and MRI T1-out from CHAOS dataset. Ours as well as the semi-supervised methods use additional 70 CT volumes from the LITS2017 testing set as unlabeled data samples for training.

Face Part Segmentation Numbers are mIoU. We train on CelebA and evaluate on CelebA (denoted as “In”) as well as the MetFaces dataset. Train labels denotes the number of annotated examples used during training. Our model as well as the semi-supervised baselines additionally use 28k unlabeled CelebA data samples.