In this work, we present Conditional Adversarial Latent Models (CALM), an approach for generating diverse and directable behaviors for user-controlled interactive virtual characters. Using imitation learning, CALM learns a representation of movement that captures the complexity and diversity of human motion, and enables direct control over character movements. The approach jointly learns a control policy and a motion encoder that reconstructs key characteristics of a given motion without merely replicating it. The results show that CALM learns a semantic motion representation, enabling control over the generated motions and style-conditioning for higher-level task training. Once trained, the character can be controlled using intuitive interfaces, akin to those found in video games.
During low-level training, CALM learns an encoder and a decoder. The encoder takes a motion from a reference dataset of motions, a time-series of joint locations, and maps it into to a low-dimensional latent representation. Additionally, CALM also jointly learns a decoder. The decoder is a low-level policy that interacts with the simulator and generates motions similar to the reference dataset. This policy produces a variety of behaviors on demand, but is not conditioned on the directionality of the motion. For example, it can be instructed to walk, but does not enable intuitive control over the direction of walking.
To evaluate the learned motion representation, we test the ability to interpolate between motions in the latent space. Here, the initial latent is the latent representation for sprint. The final latent is that of crouching idle. Throughout the episode, the latent is linearly interpolated over time, going from spring towards crouch-idle. The character smoothly transitions through semantically meaningful transitions, gradually reducing speed and tilting the upper body.
To control motion direction, we train a high-level task-driven policy to select latent variables. These latents are provided to the low-level policy which generates the requested motion. Here, the learned motion representation enables a form of style-conditioning. To achieve this, the motion encoder is used to obtain the latent representation of the requested motion. The high-level policy is then provided an additional reward proportional to the cosine distance between the selected latents and the latent representing the requested style, thus guiding the high-level policy to adopt a desired behavioral style. For example, here a directionality-controller is trained to enable control over the form of loco-motion performed and the direction in which the character performs it -- crouch-walk, walk shield-up, and run.
Finally, the previously trained models (low-level policy and directional controller) are combined to compose complex movements without additional training. To do so, the user produces a finite-state machine (FSM) containing standard rules and commands. These determine which motion to perform, similar to how a user controls a video game character. For example, they determine whether the character should perform a simple motion, performed directly using the low-level policy, or a directed motion requiring high-level control. As an example, one may construct an FSM like (a) "crouch-walk towards the target, until distance < 1m", then (b) "kick", and finally (c) "celebrate".
@inproceedings{tessler2023calm,
author = {
Tessler, Chen
and Kasten, Yoni
and Guo, Yunrong
and Mannor, Shie
and Chechik, Gal
and Peng, Xue Bin},
title = {CALM: Conditional Adversarial Latent Models for Directable Virtual Characters},
year = {2023},
isbn = {9798400701597},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3588432.3591541},
doi = {10.1145/3588432.3591541},
booktitle = {ACM SIGGRAPH 2023 Conference Proceedings},
keywords = {
reinforcement learning,
animated character control,
adversarial training,
motion capture data
},
location = {Los Angeles, CA, USA},
series = {SIGGRAPH '23}
}
CALM: Conditional Adversarial Latent Models for Directable Virtual Characters
Chen Tessler, Yoni Kasten, Yunrong Guo, Shie Mannor, Gal Chechik, and Xue Bin Peng