Simulation is a crucial component of any robotic system. In order to simulate correctly, we need to write complex rules of the environment: how dynamic agents behave, and how the actions of each of the agents affect the behavior of others. In this paper, we aim to learn a simulator by simply watching an agent interact with an environment. We focus on graphics games as a proxy of the real environment. We introduce GameGAN, a generative model that learns to visually imitate a desired game by ingesting screenplay and keyboard actions during training. Given a key pressed by the agent, GameGAN "renders" the next screen using a carefully designed generative adversarial network. Our approach offers key advantages over existing work: we design a memory module that builds an internal map of the environment, allowing for the agent to return to previously visited locations with high visual consistency. In addition, GameGAN is able to disentangle static and dynamic components within an image making the behavior of the model more interpretable, and relevant for downstream tasks that require explicit reasoning over dynamic elements. This enables many interesting applications such as swapping different components of the game to build new games that do not exist.
We are interested in training a game simulator that can model both deterministic and stochastic nature of the environment. GameGAN is composed of three modules. 1) The dynamics engine maintains an internal state variable which is recurrently updated. 2) For environments that require long-term consistency, an external memory module is used to remember what the model has generated so far 3) Finally, the rendering engine is used to decode the output image at each time instance. All modules are neural networks and trained end-to-end.
PAC-MAN™&©BANDAI NAMCO Entertainment Inc.
1. We modified the version of Pacman from http://ai.berkeley.edu/project_overview.html to create random mazes for training data.
2. We used https://github.com/hardmaru/WorldModelsExperiments to extract training data for VizDoom
Left: Final output (Static + Dynamic), Middle: Static component, Right: Dynamic component