AKD teaser

By incorporating articulation into static assets, AKD synthesizes realistic motions distilled from large video diffusion models.

Abstract

We present Articulated Kinematics Distillation (AKD), a framework for generating high-fidelity character animations by merging the strengths of skeleton-based animation and modern generative models. AKD uses a skeleton-based representation for rigged 3D assets, drastically reducing the Degrees of Freedom (DoFs) by focusing on joint-level control, which allows for efficient, consistent motion synthesis. Through Score Distillation Sampling (SDS) with pre-trained video diffusion models, AKD distills complex, articulated motions while maintaining structural integrity, overcoming challenges faced by 4D neural deformation fields in preserving shape consistency. This approach is naturally compatible with physics-based simulation, ensuring physically plausible interactions. Experiments show that AKD achieves superior 3D consistency and expressive motion quality compared with existing works on text-to-4D generation.

Supplementary Video

Method

Motion Synthesis

AKD motion synthesis pipeline

We novelly incorporate articulated skeletons into generative motion synthesis. With the low-dimensional parameterization of motions (a sequence of joint angles for articulated bones), the synthesis can focus more on motion modes instead of local-scale deformations. Given a text prompt, we use a text-to-3D method to generate a 3D asset. The asset is deformed by the skeleton and differentiably rendered into videos. The SDS gradient is evaluated by a pre-trained video diffusion transformer and backpropagated to joint angles.

Physics-based Motion Tracking

AKD physics-based motion tracking pipeline

The articulated skeletons embedded in the asset can be deployed in articulated rigid body simulators. To ground the synthesized motion in physics, we can further project the distilled motion trajectory to the nearest solution achievable in physics-based tracking in a simulation environment. We accomplish this generation-to-simulation transition by searching for a physical joint control sequence that minimizes the difference between simulated and synthesized bone trajectory.

Results

Demos

The floating issue in synthesized motion can be resolved by gravity.
The foot skating issue in the synthesized motion can be resolved through frictional contact.
a Komodo dragon is walking
a T-rex is walking
a camel is walking
a cat is walking
a corgi is walking
a crocodile is walking
a donkey is walking
a hippo is walking
a horse is walking
a lion is walking
a moose is walking
a rhino is walking
a rooster is walking
a triceratops is walking
an astronaut is walking
an elephant is walking

Motion Diversity

a lion is walking
a lion is running
a dog is walking
a dog is running
a bear is walking
a bear is running
a gorilla is walking
a gorilla is running

Comparisons

TC4D rarely produces alternating leg movements, and areas where the legs converge often appear blurry.
TC4D may show limited local-scale motions.
a T-rex is walking
a bear is walking
a camel is walking
a gorilla is walking
a hippo is walking
a lion is walking
a tortoise is walking
a triceratops is walking
an astronaut is walking
an ultraman is walking

Citation

@inproceedings{li2025akd,
  title={Articulated Kinematics Distillation from Video Diffusion Models},
  author={Li, Xuan and Ma, Qianli and Lin, Tsung-Yi and Chen, Yongxin and Jiang, Chenfanfu and Liu, Ming-Yu and Xiang, Donglai},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}