Simon Willison's WeblogJun 10, 2026, 8:00 PMimportant 76

DiffusionGemma: Google Launches High-Speed Open-Weight Gemma Diffusion Model

Original: DiffusionGemma

Google’s Gemini Diffusion research returns as an Apache 2 open-weight Gemma model.

Simon Willison highlights Google’s new DiffusionGemma, an Apache 2 licensed open-weight Gemma model. He connects it to last year’s brief Gemini Diffusion preview, which he measured at 857 tokens per second. NVIDIA is currently hosting the model for free on its NIM cloud API, where Willison generated 2,409 tokens in 4.4 seconds, implying at least 500 tokens per second.

Simon Willison’s short post outlines the context behind the emergence of Google DiffusionGemma and early hands-on test results. Last May, Google briefly released an experimental Gemini Diffusion model, and Willison tested the preview version at the time, recording a generation speed of 857 tokens per second. That performance struck him as highly promising, but Google did not subsequently provide further public updates on that research. Now, that research direction has returned in a more usable form: Google has released a new open-weight Gemma model, google/diffusiongemma-26B-A4B-it, under the Apache 2 license, meaning developers and researchers can use and experiment with it under relatively permissive terms. The post also notes that NVIDIA is currently hosting the model for free through the NIM cloud API. Willison tested it using that API, asking the model to generate output related to a “pelican riding a bicycle,” and measured runtime with time uv run generate.py. The result showed that the model returned 2,409 tokens in 4.4 seconds, which works out to at least 500 tokens per second. The article does not go deeply into the model architecture, benchmark scores, or quality comparisons, but its key point is that Google’s previously quiet Gemini Diffusion research has now reappeared as an open-weight, Apache 2-licensed Gemma model, and early API testing shows extremely high text-generation speed. For developers and ML engineers, this suggests that diffusion-based text generation may be moving beyond a demonstrative experiment and into a stage where it can be practically tested, deployed, and integrated.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Simon Willison's Weblog →

Summaries are AI-generated; the original article is authoritative.