Implicit Warping for Animation with Image Sets

We present a new implicit warping framework for image animation using sets of source images through the transfer of the motion of a driving video. A single cross-modal attention layer is used to find correspondences between the source images and the driving image, choose the most appropriate features from different source images, and warp the selected features. This is in contrast to the existing methods that use explicit flow-based warping, which is designed for animation using a single source and does not extend well to multiple sources. The pick-and-choose capability of our framework helps it achieve state-of-the-art results on multiple datasets for image animation using both single and multiple source images.

Driving video reconstruction with multiple source images
Source images Driving video FOMM fv2v Ours