Papers
arxiv:2408.14837

Diffusion Models Are Real-Time Game Engines

Published on Aug 27
ยท Submitted by akhaliq on Aug 28
#2 Paper of the day

Abstract

We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories.

Community

Paper submitter

Amazing work!

Is this the real life?

ยท

Is this just fantasy?

Amazing

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

We need similar work, but on yume nikki fangames

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

I did a post about this paper! Here's the summary:

๐ŸŽฎ ๐—” ๐—ป๐—ฒ๐˜‚๐—ฟ๐—ฎ๐—น ๐—ป๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ ๐˜€๐—ถ๐—บ๐˜‚๐—น๐—ฎ๐˜๐—ฒ๐˜€ ๐——๐—ข๐—ข๐— : ๐—ผ๐—ฝ๐—ฒ๐—ป๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐˜„๐—ฎ๐˜† ๐—ณ๐—ผ๐—ฟ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—น๐—ฒ๐˜๐—ฒ๐—น๐˜†-๐—”๐—œ-๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ๐—ฑ ๐—ด๐—ฎ๐—บ๐—ฒ๐˜€!

Imagine if games were completely live-generated by an AI model : the NPCs and their dialogues, the storyline, and even the game environment. The playerโ€™s in-game actions would have a real, lasting impact on the game story.

In a very exciting paper, Google researchers just gave us the first credible glimpse of this future.

โžก๏ธ They created GameNGen, the first neural model that can simulate a complex 3D game in real-time. They use it to simulate the classic game DOOM running at over 20 frames per second on a single TPU, with image quality comparable to lossy JPEG compression. And it feels just like the true game!

Here's how they did it:

  1. They trained an RL agent to play DOOM and recorded its gameplay sessions.
  2. They then used these recordings to train a diffusion model to predict the next frame, based on past frames and player actions.
  3. During inference, they use only 4 denoising steps (instead of the usual dozens) to generate each frame quickly.

๐—ž๐—ฒ๐˜† ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:
๐ŸŽฎ๐Ÿค” Human players can barely tell the difference between short clips (3 seconds) of the real game or the simulation
๐Ÿง  The model maintains game state (health, ammo, etc.) over long periods despite having only 3 seconds of effective context length
๐Ÿ”„ They use "noise augmentation" during training to prevent quality degradation in long play sessions
๐Ÿš€ The game runs on one TPU at 20 FPS with 4 denoising steps, or 50 FPS with model distillation (with some quality loss)

The researchers did not open source the code, but I feel like weโ€™ve just seen a part of the future being written!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2408.14837 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.14837 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.14837 in a Space README.md to link it from this page.

Collections including this paper 12