Flexible Motion In-betweening with Diffusion Models
Motion in-betweening, a fundamental technique in animation, has long been recognized as a labor-intensive and challenging process. We investigate the potential of diffusion models in generating diverse human motions guided by keyframes. Unlike previous inbetweening methods, we propose a simple unified model capable of generating precise and diverse motions that conform to a flexible range of user-specified constraints, as well as text conditioning. To this end, we propose Conditional Motion Diffusion In-betweening (CondMDI) which allows for arbitrary dense-or-sparse keyframe placement and partial keyframe constraints and generates high-quality motions that are both diverse and coherent with the given keyframes. We further explore the use of guidance and imputation-based methods for inference-time keyframing. We evaluate the performance of our diffusion-based in-betweening method on the text-conditioned HumanML3D dataset and demonstrate the versatility and efficacy of diffusion models for keyframe in-betweening.