Modern image generative models show remarkable sample quality when trained on a single domain or class of objects. In this work, we introduce a generative adversarial network that can simultaneously generate aligned image samples from multiple related domains. We leverage the fact that a variety of object classes share common attributes, with certain geometric differences. We propose Polymorphic-GAN which learns shared features across all domains and a per-domain morph layer to morph shared features according to each domain. In contrast to previous works, our framework allows simultaneous modelling of images with highly varying geometries, such as images of human faces, painted and artistic faces, as well as multiple different animal faces. We demonstrate that our model produces aligned samples for all domains and show how it can be used for applications such as segmentation transfer and cross-domain image editing, as well as training in low-data regimes. Additionally, we apply our Polymorphic-GAN on image-to-image translation tasks and show that we can greatly surpass previous approaches in cases where the geometric differences between domains are large.
Each row corresponds to one sample rendered in five different domains. We can see how samples are aligned in many attributes, including pose and lighting conditions. Furthermore, as Polymorphic-GAN inherits useful properties of StyleGAN, we can use existing algorithms, such as SeFa , to find edit vectors in Polymorphic-GAN's latent space. Here, we visualize how edit vectors can be successfully broadcasted to all domains by gradually applying the discovered edit vectors to each row.
 Yujun Shen and Bolei Zhou. Closed-Form Factorization of Latent Semantics in GANs. CVPR (2022)
With the learned morph maps, we can do zero-shot segmentation transfer. We first run an off-the-shelf segmentation network to get a mask for the image from the parent domain and then apply the warping operation with morph maps directly on the segmentation mask to get the corresponding masks for other domains.
On the right, we show interpolation between two random samples. The segmentation masks are transferred from the first column on which we run a segmentation network. For faces, as animals' noses are always mapped several pixels below the humans' noses, we add a fixed offset value to the morph maps for animals (See Appendix C.4 for details).
Polymorphic-GAN learns to disentangle the geometric differences from texture differences between domains in an unsupervised fashion. This allows us to selectively mix-and-match morph maps and rendering layers. The left figure shows how we can do novel image editing by only changing the texture of images while keeping the shapes the same. This is done by fixing the morph maps while interpolating the weights of rendering layers of different domains.
We can exploit the domain-specific geometries learned in Polymorphic-GAN's morph maps to modify images from one domain to another, for example, from SUV to Sports Car. Specifically, we interpolate the morph maps of two domains while fixing the rendering layers the same. This lets us see the geometric characteristics of each domain learned by Polymorphic-GAN. The bottom-left image shows how the shape of a human's face is transformed into a cat-shaped face using the morph map from the Cat domain. For the bottom-right image, we transform a cat using a morph map from the Wild Life domain. The cat's eyes become smaller, and the nose gets longer, similar to animals like a lion.