face-vid2vid — Cosmos Lab

Abstract

We propose a neural talking-head video synthesis model and demonstrate its application to video conferencing. Our model learns to synthesize a talking-head video using a source image containing the target person's appearance and a driving video that dictates the motion in the output. Our motion is encoded based on a novel keypoint representation, where the identity-specific and motion-related information is decomposed unsupervisedly. Extensive experimental validation shows that our model outperforms competing methods on benchmark datasets. Moreover, our compact keypoint representation enables a video conferencing system that achieves the same visual quality as the commercial H.264 standard while only using one-tenth of the bandwidth. Besides, we show our keypoint representation allows the user to rotate the head during synthesis, which is useful for simulating a face-to-face video conferencing experience.

Resources

Paper

The full paper is available on arXiv (2011.15126). A preprint PDF is also provided for convenience.

Download PDF

Code

This work is based upon Imaginaire, our open-source PyTorch library for generative models. The codebase is available for non-commercial use.

Dataset

The TalkingHead-1KH dataset we collected to train the model can be downloaded from GitHub. It contains over 1,000 hours of talking-head videos sourced from YouTube. Note that the number of videos may differ from that in the paper because a different preprocessing script was used to split the videos.

Demo

An interactive online demo is available at this page. The demo lets you upload a source portrait image and drive the synthesized talking head with a video, or rotate the head to any desired angle.

Results & Demonstrations

Example Results

Video Reconstruction

Head Rotation

Face Frontalization

Motion Transfer

Citation

@inproceedings{wang2021facevid2vid,
  title={One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing},
  author={Wang, Ting-Chun and Mallya, Arun and Liu, Ming-Yu},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}