r/LocalLLaMA top dayJun 8, 2026, 10:02 PM/u/dreamkast06

Quick note on recent QAT issues

Original: Quick note on the QAT of recent

A LocalLLaMA user claims recent Google QAT quantization is flawed and suggests using Unsloth UD Q4_K_XL for now.

The post argues that recent Google QAT quantization has several implementation problems, including token embeddings being quantized to q6k instead of using a pure mode. It also claims llama-quantize has a hardcoded parameter that mismatches some optimized groups, and that 32-block groups are misaligned. The author recommends Unsloth UD Q4_K_XL as a temporary option and says they are working on a patch.

這篇來自 r/LocalLLaMA 的短文主要是在提醒近期 Google 相關 QAT 量化結果可能有技術缺陷。作者的結論很直接：他認為 Google 的量化流程「壞了」，目前建議先使用 Unsloth UD Q4_K_XL。貼文沒有提供完整 benchmark、程式碼或重現步驟，因此應視為社群成員的技術觀察與初步判斷，而不是已經被正式驗證的公告。

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on r/LocalLLaMA top day →

open-source other llama-quantize unsloth #qat #quantization #local-llm #gguf #unsloth

Summaries are AI-generated; the original article is authoritative.