Toronto AI Lab
Physics Human Motion
Physics-based Human Motion Estimation and Synthesis from Videos

Kevin Xie 1,2,3
Tingwu Wang 1,2,3
Umar Iqbal 1
Yunrong Guo 1
Sanja Fidler 1,2,3
Florian Shkurti 2,3

2University of Toronto
3Vector Institute
ICCV 2021

We propose a trajectory optimization formulation using soft contact costs for recovering physically corrected motions from frame by frame pose estimation which can then be used to train motion synthesis models.

Human motion synthesis is an important problem with applications in graphics, gaming and simulation environments for robotics. Existing methods require accurate motion capture data for training, which is costly to obtain. Instead, we propose a framework for training generative models of physically plausible human motion directly from monocular RGB videos, which are much more widely available. At the core of our method is a novel optimization formulation that corrects imperfect image-based pose estimations by enforcing physics constraints and reasons about contacts in a differentiable way. This optimization yields corrected 3D poses and motions, as well as their corresponding contact forces. Results show that our physically-corrected motions significantly outperform prior work on pose estimation. We can then use these to train a generative model to synthesize future motion. Samples from the generative model can optionally be further refined with the same physics correction optimization. We demonstrate both qualitatively and quantitatively significantly improved motion estimation, synthesis quality and physical plausibility achieved by our method on the large scale Human3.6m dataset~\cite{h36m_pami} as compared to prior kinematic and physics-based methods. By enabling learning of motion synthesis from video, our method paves the way for large-scale, realistic and diverse motion synthesis.


Kevin Xie, Tingwu Wang, Umar Iqbal,
Yunrong Guo, Sanja Fidler, Florian Shkurti

Physics-based Human Motion Estimation and Synthesis from Videos
ICCV 2021



More Recovered Motions Combined

We recover high quality motions from video inputs only. Above we compose a sample of motions from the validation set of Human3.6m recovered using our method.

More Motions from in the Wild Videos

For in-the-wild motions, we do not assume access to ground truth camera parameters. Instead we optimize the parameters of the ground plane jointly with the motion. They are initialized such that the initial character motion is on average standing upright with respect to the ground plane and gravity assumed to be perpendicular.

More Motion Comparisons

Note below motion is shown from the side view camera (which is not the camera for the input video).

Motion Comparison Retargetted to Mesh

Physics-corrected Motion on Mesh

Failure Cases

Computer Man
Computer Man

We show frames selected from the highest error ones. Failure cases are generally due to inaccuracies in the underlying pose estimator due to occlusion or depth ambiguity. The motion stays physically correct.

We thank David Acuna for his website template.