Microsoft has taken an exciting step forward by introducing WHAMM, a cutting-edge AI model that lets you play Quake II in real time. This innovation comes from Microsoft’s Copilot Labs and aims to redefine how we think about AI in gaming. WHAMM, which stands for World and Human Action MaskGIT Model, builds on the previous WHAM-1.6B model. While the older model was trained on the game Bleeding Edge and could only handle one frame per second, WHAMM significantly ups the ante, delivering over ten frames per second for a seamless gaming experience.
Part of Microsoft’s “Muse” model family, which focuses on game development, WHAMM is designed to learn from much less data. Unlike WHAM-1.6B, which required seven years of gameplay data, WHAMM needed just a week’s worth of Quake II gameplay from a single level. This dataset was curated by professional testers to ensure high-quality learning of in-game dynamics.
Technically speaking, WHAMM diverges from its predecessor by using a MaskGIT strategy, which allows for parallel generation of image tokens over several iterations. This approach has doubled the resolution from 300 x 180 to 640 x 360 pixels, significantly speeding up generation. The process involves three stages: converting images into tokens with ViT-VQGAN, predicting actions through a backbone transformer, and refining these predictions with a secondary transformer.
WHAMM’s architecture features two main components: a backbone transformer with around 500 million parameters for initial predictions and a refinement module with 250 million parameters to enhance output. Each frame is generated using the previous nine image-action pairs as context.
Now, you can test the AI-generated Quake II demo, which includes basic interactions like moving, jumping, and shooting. You can also change the environment and explore hidden sections of the level. However, WHAMM doesn’t completely replicate the original Quake II experience. Its narrow training dataset leads to some limitations, such as visual blurring of enemies, unrealistic combat scenarios, unreliable health indicators, and disappearing objects. The playable area is confined to a single level segment, and the simulation stops once you reach the end, with input latency still being noticeable.
WHAMM is part of a broader exploration into AI-driven game development. Similar projects include GameGen-O for open-world simulations and systems like GameNGen and DIAMOND from Google and DeepMind, which simulate gameplay for games like DOOM and Counter-Strike. While these models hold great promise, they face technical challenges such as low-resolution output and reduced contextual awareness. The gaming industry’s multidisciplinary nature—covering coding, design, storytelling, and multimedia—makes it an ideal candidate for generative AI, helping to streamline development within tight budgets and timelines.