Implicit Warping — Cosmos Lab

Source images	Driving video	FOMM	fv2v	Ours

Abstract

We present a new implicit warping framework for image animation using sets of source images through the transfer of the motion of a driving video. A single cross-modal attention layer is used to find correspondences between the source images and the driving image, choose the most appropriate features from different source images, and warp the selected features. This is in contrast to the existing methods that use explicit flow-based warping, which is designed for animation using a single source and does not extend well to multiple sources. The pick-and-choose capability of our framework helps it achieve state-of-the-art results on multiple datasets for image animation using both single and multiple source images.

A single image often cannot fully describe the subject due to occlusions, limited pose information, etc. Diverse source images provide more appearance information and reduce the burden of hallucination. Implicit Warping allows you to pick-and-choose features from multiple images to produce the output.

Comparison with Prior Work

Single-source-based prior works such as FOMM, AA-PCA, and fv2v rely on explicit flow-based warping of the source image conditional on the pose of the driving image. Due to this architectural choice, they often have to be modified in ad-hoc ways to take advantage of multiple source images.

Single-Source Comparison

Source images

Driving video

Ours (single source)

FOMM

Ours

Multi-Source Comparison

In the last column, we present results from implicit warping, which solves both issues raised above. The cross-modal attention layer is able to select the appropriate features and produce an output free of artifacts. Additional results and visualizations are available at this link.

Citation

@inproceedings{mallya2022implicit,
  title={Implicit Warping for Animation with Image Sets},
  author={Mallya, Arun and Wang, Ting-Chun and Liu, Ming-Yu},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2022}
}