[3090] Gemma4 QAT + MTP quick TPS numbers

Original: [3090] Gemma4 QAT + MTP quick TPS numbers [TLDR 1.2-1.8x better]

A LocalLLaMA user reports 1.2-1.8x TPS gains from Gemma4 QAT plus MTP on an RTX 3090.

A r/LocalLLaMA user shared quick throughput numbers for Gemma4 QAT with MTP speculative decoding on an RTX 3090 24GB setup. They report roughly 1.2-1.8x TPS improvement, with Gemma 4 31B moving from about 40 tok/s to 70-80 tok/s. The author frames this as a rough benchmark, using 11 task categories and noting stochastic variation from temp 1.0.

這篇 r/LocalLLaMA 貼文是使用者針對 Gemma4 在 RTX 3090 24GB 顯卡上的快速效能分享，重點放在 QAT 與 MTP 帶來的推論速度改善。作者認為最近幾週對 24GB 或更低 VRAM 的本地模型玩家很有利，原因包括 Gemma 4、Qwen 3.6 等模型發布，QAT 帶來近似「免費」的智慧提升，以及 MTP speculative decoding 帶來額外速度。作者主觀感受是，過去被稱為 GPU poor 的使用者，在 24GB 級距上已經開始不再那麼受限。

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.