NVIDIA Research
MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting

MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting

ACM Transactions on Graphics (Proc. SIGGRAPH Asia 2024)

We introduce MaskedMimic a single unified controller for physically simulated humanoids. Our system is capable of generating a wide range of motions across diverse terrains from intuitive user-defined intents. In this work, we show several applications, including generating full-body motion from partial joint target positions, responding to joystick steering, engaging in object interactions, following paths, interpreting text commands, and even combining these modalities, such as executing text-stylized path following.

Audio Overview (generated using NotebookLM)


Listen to a high-level overview of our MaskedMimic system in action.

Abstract


Crafting a single, versatile physics-based controller that can breathe life into interactive characters across a wide spectrum of scenarios represents an exciting frontier in character animation. An ideal controller should support diverse control modalities, such as sparse target keyframes, text instructions, and scene information. While previous works have proposed physically simulated, scene-aware control models, these systems have predominantly focused on developing controllers that each specializes in a narrow set of tasks and control modalities. This work presents MaskedMimic, a novel approach that formulates physics-based character control as a general motion inpainting problem. Our key insight is to train a single unified model to synthesize motions from partial (masked) motion descriptions, such as masked keyframes, objects, text descriptions, or any combination thereof. This is achieved by leveraging motion tracking data and designing a scalable training method that can effectively utilize diverse motion descriptions to produce coherent animations. Through this process, our approach learns a physics-based controller that provides an intuitive control interface without requiring tedious reward engineering for all behaviors of interest. The resulting controller supports a wide range of control modalities and enables seamless transitions between disparate tasks. By unifying character control through motion inpainting, MaskedMimic creates versatile virtual characters. These characters can dynamically adapt to complex scenes and compose diverse motions on demand, enabling more interactive and immersive experiences.

Results


A single unified MaskedMimic controller can be conditioned on a variety of inputs to produce a wide range of behaviors. In the following videos we showcase some of the diverse behaviors enabled by MaskedMimic. All videos were generated using the same underlying controller.

Motion Tracking

The first application we consider is motion tracking. Here, MaskedMimic is provided a set of target joint positions and/or orientations and must generate a full-body motion that is consistent with these constraints. Common examples for such a task include scene-retargeting, where the goal is to reproduce a reference motion in a new scene, and vr-tracking, where the goal is to generate plausible full-body motion infered from sensors located on a vr headset and hand controllers.

Full-body Tracking




Provided motion capture recordings from flat terrains, our method is able to reconstruct them across a wide range of irregular terrains.

Sparse Tracking



Provided only a subset of the joints, our method is able to reconstruct plausible full-body motions. Here we showcase tracking a running motion from head-only constraints, and a cartwheel motion from head and hand constraints (akin to VR tracking).

Tasks: Goal Engineering

Our system can be reused to handle user-generated constraints. We call this approach goal-engineering. The user defines simple logical constraints for what they want the character to do, and our system generates a motion that satisfies the goal.

Locomotion



By constraining the head position (x,y,z) and orientation (w,x,y,z), we can generate a wide variety of locomotion behaviors. The controller is provided a set of near-frames and a single long-term frame to enable long-term planning.

Reach



By providing a constraint (right_hand, x,y,z,t), our controller is told to reach to the target location within t frames.

Steering



Constraining the pelvis position and orientation, we can generate game-controller like steering motions.

Text Control


"raises his hands and spins in place"
"plays the violin"

"balances on one foot"
"performs a forward kick"

Text is another form of partial constraints extracted from the data. These motions were generated by providing MaskedMimic a text description of the motion and without any joint constraints.

Object Interaction




Motions containing objects in the scene can be leveraged for learning object-specific behaviors. Here we showcase motions generated by MaskedMimic when instructed to "interact with that object". Conditioned on the object, MaskedMimic generates object-interaction motions that are consistent with the object's physical properties.


Our unified controller shows cross-task generalization. While object-interaction motions were trained on flat terrains, MaskedMimic generalizes to unseen objects placed on irregular terrains.

Citation

@article{tessler2024maskedmimic,
    author = {
        Tessler, Chen and Guo, Yunrong and Nabati, Ofir and Chechik, Gal
        and Peng, Xue Bin
    },
    title = {
        MaskedMimic: Unified Physics-Based Character Control Through
        Masked Motion Inpainting
    },
    year = {2024},
    journal={ACM Transactions on Graphics (TOG)},
    publisher={ACM New York, NY, USA}
}

Paper


MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting

Chen Tessler, Yunrong Guo, Ofir Nabati, Gal Chechik, and Xue Bin Peng

article arXiv version
movie Video
text_snippet BibTeX