Quick note on recent QAT issues
Original: Quick note on the QAT of recent
A LocalLLaMA user claims recent Google QAT quantization is flawed and suggests using Unsloth UD Q4_K_XL for now.
The post argues that recent Google QAT quantization has several implementation problems, including token embeddings being quantized to q6k instead of using a pure mode. It also claims llama-quantize has a hardcoded parameter that mismatches some optimized groups, and that 32-block groups are misaligned. The author recommends Unsloth UD Q4_K_XL as a temporary option and says they are working on a patch.
這篇來自 r/LocalLLaMA 的短文主要是在提醒近期 Google 相關 QAT 量化結果可能有技術缺陷。作者的結論很直接:他認為 Google 的量化流程「壞了」,目前建議先使用 Unsloth UD Q4_K_XL。貼文沒有提供完整 benchmark、程式碼或重現步驟,因此應視為社群成員的技術觀察與初步判斷,而不是已經被正式驗證的公告。
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on r/LocalLLaMA top day →Summaries are AI-generated; the original article is authoritative.