Latest in AI

Showing:inference-speedStudentsClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

[3090] Gemma4 QAT + MTP quick TPS numbers
r/LocalLLaMA top day50 days agoBenchmark
A r/LocalLLaMA user shared quick throughput numbers for Gemma4 QAT with MTP speculative decoding on an RTX 3090 24GB setup. They report roughly 1.2-1.8x TPS improvement, with Gemma 4 31B moving from about 40 tok/s to 70-80 tok/s. The author frames this as a rough benchmark, using 11 task categories and noting stochastic variation from temp 1.0.