r/LocalLLaMA top dayJun 7, 2026, 8:13 PM/u/SteppenAxolotl
Qwen 3.6 27B DeepSWE Benchmark Results Highlight Gap Between Local and Closed-Source Models
Original: Qwen 3.6 27B on DeepSWE
Qwen 3.6 27B scored 1.79% on the DeepSWE benchmark, highlighting the persistent performance gap between local open-source and closed-source models.
A community benchmark of Qwen 3.6 27B on DeepSWE yielded a score of 1.79% (18/20th place), slightly outperforming Haiku 4.5. Run on a single RTX 6000 Blackwell GPU via vLLM with reasoning enabled, the test averaged 32 minutes and 44k output tokens per task. The author notes that while Qwen 3.6 27B represents a 'poor man's local SOTA,' the massive gap compared to frontier closed models suggests local LLMs are struggling to keep pace in complex coding.
想看英文原文 / 完整內容?
前往 r/LocalLLaMA top day 原文 →摘要由 AI 整理,以原文為準。