MaskedManipulator: Versatile Whole-Body Manipulation

Zhengyi Luo

Gal Chechik

Xue Bin Peng

SIGGRAPH Asia 2025

article Paper movie Video text_snippet BibTeX code Code

Audio Overview (generated using NotebookLM)

Listen to a high-level overview of our MaskedManipulator system in action.

Abstract

We tackle the challenges of synthesizing versatile, physically simulated human motions for full-body object manipulation. Unlike prior methods that are focused on detailed motion tracking, trajectory following, or teleoperation, our framework enables users to specify versatile high-level objectives such as target object poses or body poses. To achieve this, we introduce MaskedManipulator, a generative control policy distilled from a tracking controller trained on large-scale human motion capture data. This two-stage learning process allows the system to perform complex interaction behaviors, while providing intuitive user control over both character and object motions. MaskedManipulator produces goal-directed manipulation behaviors that expand the scope of interactive animation systems beyond task-specific solutions.

Results

A single unified MaskedManipulator controller can be conditioned on a variety of inputs to produce a wide range of behaviors. In the following videos we showcase some of the diverse behaviors enabled by MaskedManipulator. All videos were generated using the same underlying controller.

Motion Tracking

The first application we consider is motion tracking. Here, MaskedManipulator is provided a set of target joint positions and/or orientations and must generate a full-body motion that is consistent with these constraints. Common examples for such a task include scene-retargeting, where the goal is to reproduce a reference motion in a new scene, and teleoperation, where the goal is to generate plausible full-body motion infered from sensors located on a vr headset and hand controllers.

Full-body Tracking

Provided motion capture recordings, our method is able to reconstruct them in a physically plausible way.

Sparse Tracking

The second application we consider is sparse tracking. Here, MaskedManipulator is provided a set of target sparse joints and/or object positions and must generate a full-body motion that is consistent with these joint positions and/or orientations and must generate a full-body motion that is consistent with these constraints. Common examples for such a task include teleoperation, where the goal is to generate plausible full-body motion infered from sensors located on a vr headset and hand controllers.

Teleoperation

Provided only a subset of the joints, our method is able to reconstruct plausible full-body motions. Here we showcase tracking from head, hand, and object constraints (akin to VR tracking).

Object Goals

MaskedManipulator can be conditioned on future object positions. Here, the goal is to move the object to the target position by the specified time.

Generative Behavior

The third application we consider is generative behavior. Here, MaskedManipulator is not provided any constraints, and must generate a full-body motion that is consistent with the object in front of it. constraints.

When no goal is provided, MaskedManipulator produces natural behavior that best matches the object in front of it.

Citation

@inproceedings{tessler2025maskedmanipulator,
    author = {
        Tessler, Chen and Jiang, Yifeng and Coumans, Erwin and Luo, Zhengyi and Chechik, Gal and Peng, Xue Bin
    },
    title = {
        MaskedManipulator: Versatile Whole-Body Manipulation
    },
    year = {2025},
    booktitle={ACM SIGGRAPH Asia 2025 Conference Proceedings}
}

Paper

MaskedManipulator: Versatile Whole-Body Manipulation

Chen Tessler, Yifeng Jiang, Erwin Coumans, Zhengyi Luo, Gal Chechik, and Xue Bin Peng

article arXiv version

movie Video

text_snippet BibTeX