Real-time character control is an essential component for interactive experiences, with a broad range of applications, including but not limited to physics simulations, video games, and virtual reality. The success of diffusion models for image synthesis has led to recent works exploring the use of these models for motion synthesis. However, the majority of these motion diffusion models are primarily designed for offline applications, where space-time models are used to synthesize an entire sequence of frames simultaneously with a pre-specified length.