llama.cpp Gemma4 MTP Support Merged
r/LocalLLaMA top day·yesterday·Release
llama.cpp PR #23398 was merged on June 7, 2026, adding MTP support for Gemma4 models.
The author reports over 2x average speedup on dense models, no observed speedup on MoE, and replicated AIME-26 results around 87%.
Support currently covers 31B and 26B-4B variants, while E4B and E2B are not supported yet; multi-GPU may need extra draft-device configuration.