GAIA:
Generative Animatable Interactive Avatars with Expression-conditioned Gaussians

SIGGRAPH 2025

GAIA leverages 3D Gaussians as an expressive volumetric representation to capture complex details of human heads such as hair and eyes. GAIA generates animation-ready Gaussian avatars by learning only on in-the-wild image datasets. GAIA supports photorealistic novel view synthesis and individual control of identity and expression. With efficient generation and rendering, GAIA is readily available for interactive animation and editing.

Abstract

3D generative models of faces trained on in-the-wild image collections have improved greatly in recent times, offering better visual fidelity and view consistency. Making such generative models animatable is a hard yet rewarding task, with applications in virtual AI agents, character animation, and telepresence. However, it is not trivial to learn a well-behaved animation model with the generative setting, as the learned latent space aims to best capture the data distribution, often omitting details such as dynamic appearance and entangling animation with other factors that affect controllability. We present GAIA: Generative Animatable Interactive Avatars, which is able to generate high-fidelity 3D head avatars for both realistic animation and rendering. To achieve consistency during animation, we learn to generate Gaussians embedded in an underlying morphable model for human heads via a shared UV parameterization. For modeling realistic animation, we further design the generator to learn expression-conditioned details for both geometric deformation and dynamic appearance. Finally, facing an inevitable entanglement problem between facial identity and expression, we propose a novel two-branch architecture that encourages the generator to disentangle identity and expression. On existing benchmarks, GAIA achieves state-of-the-art performance in visual quality as well as realistic animation. The generated Gaussian-based avatar supports highly efficient animation and rendering, making it readily available for interactive animation and appearance editing.

Approach

GAIA leverages 3D Gaussians as an expressive volumetric representation to capture complex details of human heads such as hair and eyes. With expression-conditioned Gaussians. GAIA learns geometric deformation and dynamic appearance while being animatable consistently with an underlying 3D morphable model. We further propose a novel two-branch architecture that encourages the generator to disentangle identity and expression by learning only on in-the-wild image datasets.

Results

GAIA produces photorealistic 3D Gaussian avatars with animation capabilities. Here we show the 3D-consistent renderings as well as visualizations of the Gaussians and geometry (depth) of the avatar while traversing in the latent identity space and expression space.


GAIA supports disentangled animation control of identity and expression (with blendshape parameters and rigging parameters, e.g., eyes, jaw, etc.) with efficient generation and rendering, reaching an interactive speed of 43 FPS on an NVIDIA RTX A6000 GPU.


Citation

@inproceedings{
  yu25gaia,
  title={{GAIA}: Generative Animatable Interactive Avatars with Expression-conditioned Gaussians},
  author={Zhengming Yu and Tianye Li and Jingxiang Sun and Omer Shapira and Seonwook Park and Michael Stengel and Matthew Chan and Xin Li and Wenping Wang and Koki Nagano and Shalini De Mello},
  booktitle={ACM SIGGRAPH},
  year={2025},
}

Acknowledgement

This website is based on the template from QUEEN and BLADE.