Gamma-World:
Generative Multi-Agent World Modeling
Beyond Two Players

1NVIDIA 2Tsinghua University 3University of Toronto 4Vector Institute
*Equal contribution. Joint advising.

γ-World


γ-World interactively generates coherent future frames from multi-agent actions while preserving shared-world consistency, scaling from virtual games to real-world environments.

γ-World Teaser

Abstract


World models for interactive video generation have largely focused on single-agent settings, where future observations are rolled out from a single action stream, user input, or controllable viewpoint. However, many simulated worlds are inherently populated: multiple players, robots, or embodied agents act simultaneously within a shared, evolving environment. Scaling world models to such settings requires a principled multi-agent design: agents should remain independently controllable, permutation-symmetric, and support efficient inference while maintaining consistency across time and perspectives.

In this paper, we present γ-World, a generative multi-agent world model for interactive simulation. γ-World introduces Simplex Rotary Agent Encoding, a parameter-free extension of 3D RoPE that represents agents as vertices of a regular simplex in rotary angle space. This gives each agent a distinct phase while making all agents permutation-equivalent, enabling scalable agent identity without learned per-slot identities or a fixed agent ordering.

To support efficient cross-agent interaction, we further propose Sparse Hub Attention, where learnable hub tokens mediate communication across agents, reducing cross-agent attention cost from quadratic to linear in the number of agents. Finally, we use a bidirectional multi-agent teacher to guide a block-causal student with distillation, after which the final causal model can use KV caching for streaming, achieving real-time action-responsive rollouts at 24 FPS.

Experiments in multiplayer virtual environments show that γ-World improves video fidelity, action controllability, and inter-agent consistency over slot-based and dense-attention baselines, while generalizing from two to four players without additional training.

Method


Method overview

Architecture overview. γ-World takes per-agent action streams and produces a shared, multi-view rollout. Two key designs make it scalable to many agents:

Simplex Rotary Agent Encoding

A parameter-free extension of 3D RoPE that represents agents as vertices of a regular simplex in rotary angle space. Each agent receives a distinct phase while remaining permutation-equivalent, eliminating the need for learned per-slot identities or a fixed agent ordering.

Sparse Hub Attention

Learnable hub tokens mediate communication across agents, reducing cross-agent attention cost from quadratic to linear in the number of agents — enabling efficient scaling to four or more agents.

Efficiency: Sparse Hub Attention

Sparse Hub Attention scales linearly with the number of agents, while dense attention scales quadratically.

Sparse Hub Attention Timing

Acknowledgement

The authors would like to thank Product Managers Aditya Mahajan and Matt Cragun for their valuable support and guidance, Jingnan Gao for proof discussion, and Yixin Hong for demo creation.

Citation

@article{gammaworld2026,
    title={Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players},
    author={Fangfu Liu and Kai He and Tianchang Shen and Tianshi Cao and
            Sanja Fidler and Yueqi Duan and Jun Gao and Igor Gilitschenski and
            Zian Wang and Xuanchi Ren},
    journal={arXiv preprint arXiv:2605.28816},
    year={2026}
}