[3090] Gemma4 QAT + MTP quick TPS numbers
Original: [3090] Gemma4 QAT + MTP quick TPS numbers [TLDR 1.2-1.8x better]
A LocalLLaMA user reports 1.2-1.8x TPS gains from Gemma4 QAT plus MTP on an RTX 3090.
A r/LocalLLaMA user shared quick throughput numbers for Gemma4 QAT with MTP speculative decoding on an RTX 3090 24GB setup. They report roughly 1.2-1.8x TPS improvement, with Gemma 4 31B moving from about 40 tok/s to 70-80 tok/s. The author frames this as a rough benchmark, using 11 task categories and noting stochastic variation from temp 1.0.
這篇 r/LocalLLaMA 貼文是使用者針對 Gemma4 在 RTX 3090 24GB 顯卡上的快速效能分享,重點放在 QAT 與 MTP 帶來的推論速度改善。作者認為最近幾週對 24GB 或更低 VRAM 的本地模型玩家很有利,原因包括 Gemma 4、Qwen 3.6 等模型發布,QAT 帶來近似「免費」的智慧提升,以及 MTP speculative decoding 帶來額外速度。作者主觀感受是,過去被稱為 GPU poor 的使用者,在 24GB 級距上已經開始不再那麼受限。
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on r/LocalLLaMA top day →Summaries are AI-generated; the original article is authoritative.