DiffusionGemma: Google Launches High-Speed Open-Weight Gemma Diffusion Model | EveryCorner

Simon Willison’s short post outlines the context behind the emergence of Google DiffusionGemma and early hands-on test results. Last May, Google briefly released an experimental Gemini Diffusion model, and Willison tested the preview version at the time, recording a generation speed of 857 tokens per second. That performance struck him as highly promising, but Google did not subsequently provide further public updates on that research. Now, that research direction has returned in a more usable form: Google has released a new open-weight Gemma model, google/diffusiongemma-26B-A4B-it, under the Apache 2 license, meaning developers and researchers can use and experiment with it under relatively permissive terms. The post also notes that NVIDIA is currently hosting the model for free through the NIM cloud API. Willison tested it using that API, asking the model to generate output related to a “pelican riding a bicycle,” and measured runtime with time uv run generate.py. The result showed that the model returned 2,409 tokens in 4.4 seconds, which works out to at least 500 tokens per second. The article does not go deeply into the model architecture, benchmark scores, or quality comparisons, but its key point is that Google’s previously quiet Gemini Diffusion research has now reappeared as an open-weight, Apache 2-licensed Gemma model, and early API testing shows extremely high text-generation speed. For developers and ML engineers, this suggests that diffusion-based text generation may be moving beyond a demonstrative experiment and into a stage where it can be practically tested, deployed, and integrated.