arxiv:2408.14837

Diffusion Models Are Real-Time Game Engines

Published on Aug 27

· Submitted by

akhaliq on Aug 28

#2 Paper of the day

Upvote

119

Authors:

Dani Valevski ,

Yaniv Leviathan ,

Moab Arar ,

Shlomi Fruchter

Abstract

We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories.

View arXiv page View PDF Add to collection

Community

akhaliq

Paper submitter 23 days ago

https://gamengen.github.io/

shuaishuaicdp

23 days ago

Amazing work!

BaoLocTown

23 days ago

•

edited 21 days ago

Is this the real life?

patmcguinness

22 days ago

Is this just fantasy?

kudymov

22 days ago

Amazing

AdinaY

22 days ago

🔥🔥🔥

4eJIoBek

22 days ago

We need similar work, but on yume nikki fangames

librarian-bot

22 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

m-ric

21 days ago

I did a post about this paper! Here's the summary:

🎮 𝗔 𝗻𝗲𝘂𝗿𝗮𝗹 𝗻𝗲𝘁𝘄𝗼𝗿𝗸 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗲𝘀 𝗗𝗢𝗢𝗠: 𝗼𝗽𝗲𝗻𝗶𝗻𝗴 𝘁𝗵𝗲 𝘄𝗮𝘆 𝗳𝗼𝗿 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲𝗹𝘆-𝗔𝗜-𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝗱 𝗴𝗮𝗺𝗲𝘀!

Imagine if games were completely live-generated by an AI model : the NPCs and their dialogues, the storyline, and even the game environment. The player’s in-game actions would have a real, lasting impact on the game story.

In a very exciting paper, Google researchers just gave us the first credible glimpse of this future.

➡️ They created GameNGen, the first neural model that can simulate a complex 3D game in real-time. They use it to simulate the classic game DOOM running at over 20 frames per second on a single TPU, with image quality comparable to lossy JPEG compression. And it feels just like the true game!

Here's how they did it:

They trained an RL agent to play DOOM and recorded its gameplay sessions.
They then used these recordings to train a diffusion model to predict the next frame, based on past frames and player actions.
During inference, they use only 4 denoising steps (instead of the usual dozens) to generate each frame quickly.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:
🎮🤔 Human players can barely tell the difference between short clips (3 seconds) of the real game or the simulation
🧠 The model maintains game state (health, ammo, etc.) over long periods despite having only 3 seconds of effective context length
🔄 They use "noise augmentation" during training to prevent quality degradation in long play sessions
🚀 The game runs on one TPU at 20 FPS with 4 denoising steps, or 50 FPS with model distillation (with some quality loss)

The researchers did not open source the code, but I feel like we’ve just seen a part of the future being written!