Quick Start#

This page provides a quick introduction to motion generation with Kimodo. For detailed explanations, we recommend reviewing the full documentation pages linked in each section.

Before running these commands, follow the installation guide to install Kimodo in a virtual environment or using Docker.

Overview: Kimodo Models#

Motion generation can be performed with several trained Kimodo models that vary by skeleton and training dataset.

Note: models will be downloaded automatically when attempting to generate from the CLI or Interactive Demo, so there is no need to download them manually

Model	Skeleton	Training Data	Release Date	Hugging Face	License
Kimodo-SOMA-RP-v1.1	SOMA	Bones Rigplay 1	April 10, 2026	Link	NVIDIA Open Model
Kimodo-SOMA-SEED-v1.1	SOMA	BONES-SEED	April 10, 2026	Link	NVIDIA Open Model
Kimodo-SOMA-RP-v1	SOMA	Bones Rigplay 1	March 16, 2026	Link	NVIDIA Open Model
Kimodo-G1-RP-v1	Unitree G1	Bones Rigplay 1	March 16, 2026	Link	NVIDIA Open Model
Kimodo-SOMA-SEED-v1	SOMA	BONES-SEED	March 16, 2026	Link	NVIDIA Open Model
Kimodo-G1-SEED-v1	Unitree G1	BONES-SEED	March 16, 2026	Link	NVIDIA Open Model
Kimodo-SMPLX-RP-v1	SMPL-X	Bones Rigplay 1	March 16, 2026	Link	NVIDIA R&D Model

By default, we recommend using the models trained on the full Bones Rigplay dataset (700 hours of mocap) for your motion generation needs. The models trained on BONES-SEED use 288 hours of publicly available mocap data so are less capable, but are useful for comparing your own trained models on the same dataset. See the benchmark for a standardized evaluation suite on BONES-SEED.

Recommended Hardware#

Kimodo requires ~17GB of VRAM to generate locally entirely on GPU, due primarily to the size of the text embedding model. If you have a smaller card, set TEXT_ENCODER_DEVICE=cpu when running Kimodo commands to force text encoding to the CPU. This is slightly slower but reduces VRAM usage to <3 GB.

The model has been most extensively tested on GeForce RTX 3090, GeForce RTX 4090, and NVIDIA A100 GPUs, but it should work on other recent cards with sufficient VRAM.

Run Text-Encoder Service#

Motion generation relies on embedding the input text prompt, which becomes the input to Kimodo. Although it is fine to run the CLI commands and demo on their own, it may be preferred to start the text encoder service in the background, which can be shared across all motion generation requests. This is much more efficient when making many consecutive CLI calls, as it avoids needing to instantiate the large text encoder every time.

To start the text encoder service:

kimodo_textencoder

The first run of the service will take a while as it downloads the embedding model. We recommend running this in the background or in a separate terminal where it will stay open and usable by other scripts.

If you are using the Docker set up, the service can alternatively be started in the container with:

docker compose up text-encoder

Note: when the text encoder is initialized, the transformers library will report several unexpected and missing layers for LLM2Vec. These are expected and can be safely ignored.

If you are running on a GPU with <16 GB VRAM, you can force the text encoder to the CPU, for example:

TEXT_ENCODER_DEVICE=cpu kimodo_textencoder

Command-Line Text-to-Motion Generation#

CLI Documentation

You can generate motions from the command line using the generate script:

kimodo_gen "A person walks forward." \
    --model Kimodo-SOMA-RP-v1 \
    --duration 5.0 \
    --output output

The --model command corresponds to the model name in the table above. The output motion will be saved using the stem name given by --output in the Kimodo output format. For a detailed description of all generation arguments, including how to generate motion with constraints, see the full CLI documentation.

If you set up Kimodo with Docker, you can instead run generation inside the Docker container, replacing kimodo_gen XXX with docker compose run --rm demo kimodo_gen XXX. If you will be running generation multiple times, it is better to start the demo container (e.g., in another terminal or in the background), and then run commands inside it with docker compose exec demo kimodo_gen XXX.

Interactive Motion Authoring Demo#

Demo Documentation

The demo allows easily generating motions with an intuitive control interface for text prompting and constraints.

The demo can be started with:

kimodo_demo

The demo is a webapp that will run on http://localhost:7860. Open this URL in your browser to access the interface.

If you are using Docker, the demo can be launched with:

docker compose up demo

or if you want to start the demo and text encoder service (explained below) at the same time, use:

docker compose up

Additional Tips for Docker

You may find the following commands useful if running Kimodo within the Docker containers. In the example commands below, you can also replace demo by text-encoder:

Check logs:

docker compose logs demo

Stop service:

docker compose stop demo

Restart service:

docker compose restart demo

Stop and remove everything:

docker compose down