r/LocalLLaMA top dayJun 8, 2026, 7:52 PM/u/OsmanthusBloom

Qwen3.6-35B-A3B Tool Calling Benchmark: ByteShape vs Unsloth GGUFs

Original: Qwen3.6-35B-A3B tool calling benchmark: ByteShape vs. Unsloth GGUFs, KV cache quants & long context performance

A tool-calling benchmark finds q8_0 KV cache nearly free, q4_0 worse, and long context broadly harmful.

The post benchmarks eight Qwen3.6-35B-A3B GGUF quants from ByteShape and Unsloth using llama.cpp and tool-eval-bench. It compares f16, q8_0, and q4_0 KV cache quantization under short and long-context pressure, totaling 144 runs and roughly 300 GPU-hours. The author reports no clear ByteShape versus Unsloth winner, q8_0 as close to a free lunch, q4_0 as weaker, and long context as a major tool-calling degradation factor.

這篇貼文是一個針對 Qwen3.6-35B-A3B 的工具呼叫能力基準測試，重點不在一般 perplexity 或速度，而是模型在實際 tool calling 場景下的輸出品質。作者受到先前 r/LocalLLaMA 討論啟發，想回答三個問題：ByteShape 宣稱約 4bpw 量化能保留未量化模型 99% 以上基準分數是否可信；KV cache 量化在真實任務中是否會傷害表現；以及長上下文是否會改變結論。

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on r/LocalLLaMA top day →

qwen open-source llama-cpp tool-eval-bench byteshape unsloth #tool-calling #gguf #quantization #kv-cache #long-context

Summaries are AI-generated; the original article is authoritative.