r/LocalLLaMA top dayJun 9, 2026, 1:28 AM/u/gcavalcante8808

JetBrains Mellum 2: a really good and performant model

Original: Jetbrains Mellum 2: a really good and performant model

A Reddit user reports strong local speed and tool-use results from JetBrains Mellum 2 on AMD hardware.

A r/LocalLLaMA user shared informal impressions of JetBrains Mellum 2, focusing on local coding-style tasks and tool calls. On an AMD Radeon RX 7900 XT with llama.cpp Vulkan and 131K context, the model reportedly generated around 111 tokens/s and stayed above 100 tokens/s near full context. The author stresses this is not a scientific benchmark, but a practical workflow-oriented test.

這篇 r/LocalLLaMA 貼文是使用者對 JetBrains Mellum 2 的個人實測心得,主軸不是嚴格學術 benchmark,而是以日常開發任務、工具呼叫與本機推論速度來評估模型實用性。作者測試的是 JetBrains/Mellum2-12B-A2.5B-Thinking,這是一個 12B MoE 模型,每次啟用約 2.5B 參數。測試環境包含 AMD Radeon RX 7900 XT 20GB、AMD Ryzen 9 3900X、128GB DDR4 RAM,後端使用 llama.cpp Vulkan b9544,context 設為 131,072 tokens,KV cache 使用 bf16。作者回報 prompt eval 約 492.7 tokens/s,生成速度約 111.2 tokens/s,約 9ms/token,並表示即使在約 130K context 下,生成速度也沒有低於 100 tokens/s。

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on r/LocalLLaMA top day →

Summaries are AI-generated; the original article is authoritative.