Listen to a high-level overview of our MaskedMimic system in action.
Abstract
Crafting a single, versatile physics-based controller that can breathe life into interactive characters
across a wide spectrum of scenarios represents an exciting frontier in character animation.
An ideal controller should support diverse control modalities, such as sparse target keyframes, text
instructions, and scene information. While previous works have proposed physically simulated, scene-aware
control models, these systems have predominantly focused on developing controllers that each specializes in
a narrow set of tasks and control modalities.
This work presents MaskedMimic, a novel approach that formulates physics-based character control as a
general
motion inpainting problem. Our key insight is to train a single unified model to synthesize motions from
partial (masked) motion descriptions, such as masked keyframes, objects, text descriptions, or any
combination thereof. This is achieved by leveraging motion tracking data and designing a scalable training
method that can effectively utilize diverse motion descriptions to produce coherent animations.
Through this process, our approach learns a physics-based controller that provides an intuitive control
interface without requiring tedious reward engineering for all behaviors of interest. The resulting
controller supports a wide range of control modalities and enables seamless transitions between disparate
tasks.
By unifying character control through motion inpainting, MaskedMimic creates versatile virtual characters.
These
characters can dynamically adapt to complex scenes and compose diverse motions on demand, enabling more
interactive and immersive experiences.
Results
A single unified MaskedMimic controller can be conditioned on a variety of inputs to produce a
wide range of behaviors.
In the following videos we showcase some of the diverse behaviors enabled by MaskedMimic. All
videos were generated using the same underlying controller.
Motion Tracking
The first application we consider is motion tracking. Here, MaskedMimic is provided a set of target
joint positions and/or orientations and must generate a full-body motion that is consistent with these
constraints.
Common examples for such a task include scene-retargeting, where the goal is to reproduce a reference
motion in a new scene, and vr-tracking, where the goal is to generate plausible full-body motion infered
from sensors located on a vr headset and hand controllers.
Full-body Tracking
Provided motion capture recordings from flat terrains, our method is able to reconstruct them across a
wide range of irregular terrains.
Sparse Tracking
Provided only a subset of the joints, our method is able to reconstruct plausible full-body motions.
Here we showcase tracking a running motion from head-only constraints, and a cartwheel motion from head
and hand constraints (akin to VR tracking).
Tasks: Goal Engineering
Our system can be reused to handle user-generated constraints. We call this approach goal-engineering.
The user defines simple logical constraints for what they want the character to do, and our system
generates a motion
that satisfies the goal.
Locomotion
By constraining the head position (x,y,z) and orientation (w,x,y,z), we can generate a wide variety of
locomotion behaviors. The controller is provided a set of near-frames and a single long-term frame to
enable long-term planning.
Reach
By providing a constraint (right_hand, x,y,z,t), our controller is told to reach to the target location
within t frames.
Steering
Constraining the pelvis position and orientation, we can generate game-controller like steering motions.
Text Control
Text is another form of partial constraints extracted from the data. These motions were generated by
providing MaskedMimic a text description of the motion and without any joint constraints.
Object Interaction
Motions containing objects in the scene can be leveraged for learning object-specific behaviors.
Here we showcase motions generated by MaskedMimic when instructed to "interact with that object".
Conditioned on the object, MaskedMimic generates object-interaction motions that are consistent with the
object's physical properties.
Our unified controller shows cross-task generalization. While object-interaction motions were trained on
flat terrains, MaskedMimic generalizes to unseen objects placed on irregular terrains.
Citation
@article{tessler2024maskedmimic,
author = {
Tessler, Chen and Guo, Yunrong and Nabati, Ofir and Chechik, Gal
and Peng, Xue Bin
},
title = {
MaskedMimic: Unified Physics-Based Character Control Through
Masked Motion Inpainting
},
year = {2024},
journal={ACM Transactions on Graphics (TOG)},
publisher={ACM New York, NY, USA}
}
Paper
MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting
Chen Tessler, Yunrong Guo, Ofir Nabati, Gal Chechik, and Xue Bin Peng