Hacker News (AI keywords)Jun 5, 2026, 4:18 PMtheanonymousone重要 72

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

Original: Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Google released Gemma 4 QAT checkpoints to reduce memory use for local mobile, laptop, and GPU inference.

Google released new Gemma 4 checkpoints optimized with Quantization-Aware Training to preserve quality after compression. The release includes Q4_0 checkpoints and a mobile-focused quantization format that can reduce Gemma 4 E2B memory use to about 1GB, or below 1GB for a text-only configuration. The models are available through Hugging Face and supported across llama.cpp, Ollama, LM Studio, LiteRT-LM, Transformers.js, SGLang, vLLM, MLX, and Unsloth.

想看英文原文 / 完整內容?

前往 Hacker News (AI keywords) 原文 →

摘要由 AI 整理,以原文為準。