BigDatasetGAN

Annotating images with pixel-wise labels is a time-consuming and costly process. Recently, DatasetGAN showcased a promising alternative - to synthesize a large labeled dataset via a generative adversarial network (GAN) by exploiting a small set of manually labeled, GAN-generated images. Here, we scale DatasetGAN to ImageNet scale of class diversity. We take image samples from the class-conditional generative model BigGAN trained on ImageNet, and manually annotate 5 images per class, for all 1k classes. By training an effective feature segmentation architecture on top of BigGAN, we turn BigGAN into a labeled dataset generator. We further show that VQGAN can similarly serve as a dataset generator, leveraging the already annotated data. We create a new ImageNet benchmark by labeling an additional set of 8k real images and evaluate segmentation performance in a variety of settings. Through an extensive ablation study we show big gains in leveraging a large generated dataset to train different supervised and self-supervised backbone models on pixel-wise tasks. Furthermore, we demonstrate that using our synthesized datasets for pre-training leads to improvements over standard ImageNet pre-training on several downstream datasets, such as PASCAL-VOC, MS-COCO, Cityscapes and chest X-ray, as well as tasks (detection, segmentation). Our benchmark will be made public and maintain a leaderboard for this challenging task.

If you find this work useful for your research, please consider citing it as:

@inproceedings{bigDatasetGAN,
      title = {BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations}, 
      author = {Daiqing Li and Huan Ling and Seung Wook Kim and Karsten Kreis and 
                Adela Barriuso and Sanja Fidler and Antonio Torralba},
      eprint={2201.04684},
      archivePrefix={arXiv},
      year = {2022}
    }

See prior work on using GANs for downstream tasks, which BigDatasetGAN builds on:
DatasetGAN

@inproceedings{zhang21,
      title={DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort},
      author={Zhang, Yuxuan and Ling, Huan and Gao, Jun and Yin, Kangxue and Lafleche, 
      Jean-Francois and Barriuso, Adela and Torralba, Antonio and Fidler, Sanja},
      booktitle={CVPR},
      year={2021}
    }

SemanticGAN

@inproceedings{semanticGAN, 
    title={Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization}, 
    booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)}, 
    author={Li, Daiqing and Yang, Junlin and Kreis, Karsten and Torralba, Antonio and Fidler, Sanja}, 
    year={2021}, 
    }

News

CVPR Presentation

Methods

Dataset Analysis

Imagenet Segmentation Benchmark

Imagenet Segmentation Visualization

Imagenet Segmentation vs Classification Analysis

Imagenet Segmentation Ablation Study

Downstream Tasks Performance

Paper

Citation

Dataset Visualization


Examples from our datasets: Real-annotated (real ImageNet subset labeled manually), Synthetic-annotated (BigGAN’s samples labeled manually), and synthetic BigGAN-sim, VQGAN-sim datasets. Notice the high quality of synthetic sampled labeled examples.


Ablating synthetic dataset size. We fix the model to the Resnet50 backbone and compare the performance when we increase the synthetic dataset size. The model trained using a 22k synthetic dataset outperforms the same model trained with 2k human-annotated dataset. Another 7 points is gained when further increasing the synthetic data size from 22k to 220k. Here, 2M is the total number of samples synthesized through our online sampling strategy.		Ablating backbone size. We scale up the backbone from Resnet50 to Resnet101 and Resnet152. We supervise with 2k human-annotated labels (red), and with our BigGAN-sim dataset (green), which is 100x larger. BigGAN-sim dataset supervision leads to consistent improvements, especially for larger models.


Semi-supervised chest X-ray segmentation with a frozen backbone. Performance numbers are mIoU. When using our synthetic dataset, we match the performance of the supervised and self-supervised pre-trained networks with only 1% and 5% of labels, respectively. We achieve a big gain using 100% of the data. Numbers are averaged over 3 independent trials.		Cityscapes instance and semantic segmentation. training with our BigGAN-sim dataset improves AP mk by 0.3 points in the instance segmentation task over the baseline model. However, we do not see a significant performance boost for the semantic segmentation task.

	BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba [Paper] [Benchmark and dataset] (coming soon) For feedback and questions please reach out to Daiqing Li and Huan Ling.

BigGAN-sim vs VQGAN-sim. We select the same classes at each row for both BigGAN-sim and VQGAN-sim for easy comparison. Comparing to BigGAN-sim, the VQGAN-sim dataset samples are more diverse in terms of object scale, pose as well as background. However, we see BigGAN-sim has better label quality than VQGAN-sim where in some cases the labels have holes and are noisy.

BigGAN-sim per-class samples		VQGAN-sim per-class samples