Listen to a high-level overview of our MaskedMimic system in action.
Crafting a single, versatile physics-based controller that can breathe life into interactive characters across a wide spectrum of scenarios represents an exciting frontier in character animation. An ideal controller should support diverse control modalities, such as sparse target keyframes, text instructions, and scene information. While previous works have proposed physically simulated, scene-aware control models, these systems have predominantly focused on developing controllers that each specializes in a narrow set of tasks and control modalities. This work presents MaskedMimic, a novel approach that formulates physics-based character control as a general motion inpainting problem. Our key insight is to train a single unified model to synthesize motions from partial (masked) motion descriptions, such as masked keyframes, objects, text descriptions, or any combination thereof. This is achieved by leveraging motion tracking data and designing a scalable training method that can effectively utilize diverse motion descriptions to produce coherent animations. Through this process, our approach learns a physics-based controller that provides an intuitive control interface without requiring tedious reward engineering for all behaviors of interest. The resulting controller supports a wide range of control modalities and enables seamless transitions between disparate tasks. By unifying character control through motion inpainting, MaskedMimic creates versatile virtual characters. These characters can dynamically adapt to complex scenes and compose diverse motions on demand, enabling more interactive and immersive experiences.
The first application we consider is motion tracking. Here, MaskedMimic is provided a set of target joint positions and/or orientations and must generate a full-body motion that is consistent with these constraints. Common examples for such a task include scene-retargeting, where the goal is to reproduce a reference motion in a new scene, and vr-tracking, where the goal is to generate plausible full-body motion infered from sensors located on a vr headset and hand controllers.
Provided motion capture recordings from flat terrains, our method is able to reconstruct them across a wide range of irregular terrains.
Provided only a subset of the joints, our method is able to reconstruct plausible full-body motions. Here we showcase tracking a running motion from head-only constraints, and a cartwheel motion from head and hand constraints (akin to VR tracking).
Our system can be reused to handle user-generated constraints. We call this approach goal-engineering. The user defines simple logical constraints for what they want the character to do, and our system generates a motion that satisfies the goal.
By constraining the head position (x,y,z) and orientation (w,x,y,z), we can generate a wide variety of locomotion behaviors. The controller is provided a set of near-frames and a single long-term frame to enable long-term planning.
By providing a constraint (right_hand, x,y,z,t), our controller is told to reach to the target location within t frames.
Constraining the pelvis position and orientation, we can generate game-controller like steering motions.
Text is another form of partial constraints extracted from the data. These motions were generated by providing MaskedMimic a text description of the motion and without any joint constraints.
Motions containing objects in the scene can be leveraged for learning object-specific behaviors. Here we showcase motions generated by MaskedMimic when instructed to "interact with that object". Conditioned on the object, MaskedMimic generates object-interaction motions that are consistent with the object's physical properties.
Our unified controller shows cross-task generalization. While object-interaction motions were trained on flat terrains, MaskedMimic generalizes to unseen objects placed on irregular terrains.
@article{tessler2024maskedmimic,
author = {
Tessler, Chen and Guo, Yunrong and Nabati, Ofir and Chechik, Gal
and Peng, Xue Bin
},
title = {
MaskedMimic: Unified Physics-Based Character Control Through
Masked Motion Inpainting
},
year = {2024},
journal={ACM Transactions on Graphics (TOG)},
publisher={ACM New York, NY, USA}
}
MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting
Chen Tessler, Yunrong Guo, Ofir Nabati, Gal Chechik, and Xue Bin Peng