Latest in AI

Showing:moeClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Unsloth releases GGUF version of Cohere North-Mini-Code 1.0 (30B A3B MoE) on Hugging Face
r/LocalLLaMA top day48 days agoRelease
Unsloth uploaded a GGUF version of Cohere's North-Mini-Code 1.0 to Hugging Face, making local inference possible for this 30B A3B MoE coding-focused model. The poster links the release to llama.cpp PR #24260, suggesting new architecture support may be required. No benchmarks or test results have been shared yet; this is an early community resource post.
JetBrains Mellum 2: a really good and performant model
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user shared informal impressions of JetBrains Mellum 2, focusing on local coding-style tasks and tool calls. On an AMD Radeon RX 7900 XT with llama.cpp Vulkan and 131K context, the model reportedly generated around 111 tokens/s and stayed above 100 tokens/s near full context. The author stresses this is not a scientific benchmark, but a practical workflow-oriented test.
Xiaomi Claims 1,000+ TPS on a 1T Model Using a Standard 8-GPU Server★ 72
r/LocalLLaMA top day49 days agoBenchmark
Xiaomi announced MiMo-V2.5-Pro-UltraSpeed with TileRT, claiming over 1,000 tokens/s decode speed on a 1-trillion-parameter MoE model. The company says it runs on a single standard 8-GPU commodity node, not wafer-scale or SRAM-heavy specialized hardware. The claimed stack combines FP4 MoE expert quantization, DFlash speculative decoding, and TileRT low-latency inference kernels, but independent validation is still needed.
Introducing Mistral 3★ 84
Mistral AI News50 days agoRelease
Mistral AI introduced Mistral 3, a new open model family under Apache 2.0. It includes Mistral Large 3, a 675B-parameter sparse MoE with 41B active parameters, plus Ministral 3 models at 3B, 8B, and 14B. The release targets frontier open-weight use, multimodal and multilingual workflows, enterprise customization, and efficient local or edge deployments.
Introducing Mistral Small 4★ 76
Mistral AI News50 days agoRelease
Mistral AI introduced Mistral Small 4 as the next major release in the Mistral Small family. It combines reasoning, multimodal, and agentic coding capabilities into one open model with configurable reasoning effort. The model uses a MoE architecture, supports a 256k context window and text-image inputs, and is available through Mistral API, AI Studio, Hugging Face, NVIDIA NIM, and common inference stacks.
Introducing Mistral 3★ 78
Mistral AI News50 days agoRelease
Mistral AI introduced Mistral 3, a new open model family including Mistral Large 3 and Ministral 3 models at 3B, 8B, and 14B sizes. Large 3 is a 675B-parameter sparse MoE model with 41B active parameters, while Ministral 3 targets local and edge use cases. The models are released under Apache 2.0 and are available through Mistral AI Studio, Hugging Face, Amazon Bedrock, and other platforms.
Introducing Mistral Small 4★ 78
Mistral AI News50 days agoRelease
Mistral Small 4 is the next major release in the Mistral Small family, unifying Magistral-style reasoning, Pixtral-style multimodality, and Devstral-style coding agents. It uses a MoE architecture with 119B total parameters, 6B active parameters per token, a 256k context window, and configurable reasoning effort. The model is available via Mistral API, AI Studio, Hugging Face, open-source serving stacks, and NVIDIA deployment options.
Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?
r/LocalLLaMA top day50 days agoCommentary
A popular Reddit thread on r/LocalLLaMA discusses the potential of 2-bit Quantization Aware Training (QAT) for large MoE models (120B to 400B). While current QAT efforts focus on 4-bit, users speculate whether a 2-bit QAT model could fit into consumer hardware (64GB/128GB RAM) and outperform a 4-bit model of half its size. This approach is proposed as a practical alternative to training ternary (1.58-bit) LLMs from scratch.
Thinking Machines 推出原生互動模型 TML-Interaction-Small 276B-A12B：突破即時語音 SOTA 並淘汰傳統 VAD★ 85
Latent Space77 days agoRelease
According to AINews, the AI research team Thinking Machines (affectionately nicknamed "Team Thinky" by the community) has recently unveiled a new native…
Transformer 中的混合專家模型 (MoE) 技術解析：原理、優缺點與實作挑戰★ 82
Hugging Face Blog152 days agoTutorial
Mixture of Experts (MoE) has become the mainstream architecture for current large language models (LLMs). This article takes an in-depth look at how MoE…
中國開源 AI 生態系的架構抉擇：超越 DeepSeek 的下一步★ 85
Hugging Face Blog181 days agoCommentary
This blog post from Hugging Face reviews the full year of technical evolution since the "DeepSeek Moment" at the start of 2025 — the release of DeepSeek-V3 and…
「DeepSeek 時刻」一週年：開源 AI 的典範轉移與變革回顧★ 85
Hugging Face Blog188 days agoCommentary
The DeepSeek-V3 and R1 models released in January 2025 have been hailed as the "DeepSeek Moment" in the AI world. This upheaval not only shattered the myth…
歡迎來到 Falcon 3 開源模型家族！TII 推出全新輕量與 MoE 模型架構★ 80
Hugging Face Blog588 days agoRelease
The Technology Innovation Institute (TII) of Abu Dhabi has officially launched the new Falcon 3 open-source model family on Hugging Face. This marks a major…
在 Replicate 上透過 API 輕鬆運行 Snowflake Arctic 開源大模型
Replicate Blog826 days agoNew Tool
Snowflake recently launched a brand-new open-source large language model called "Snowflake Arctic" — a Mixture of Experts (MoE) model designed for…
SegMoE：Segmind 推出擴散模型混合專家（Mixture of Diffusion Experts）框架★ 75
Hugging Face Blog906 days agoRelease
In the large language model (LLM) space, the Mixture of Experts (MoE) architecture (as seen in models like Mixtral 8x7B) has proven capable of dramatically…
2023 年：開源大語言模型（Open LLMs）爆發之年★ 75
Hugging Face Blog953 days agoCommentary
Looking back on 2023, the most notable trend in the AI landscape was the explosive growth of open-source large language models (Open LLMs). In this annual…
混合專家模型 (Mixture of Experts, MoE) 技術詳解★ 85
Hugging Face Blog960 days agoTutorial
Mixture of Experts (MoE) has become a core technology for improving the performance and efficiency of today's large language models (LLMs). Traditional "dense…
歡迎 Mixtral：Hugging Face 迎來頂尖的混合專家（MoE）開源模型★ 90
Hugging Face Blog960 days agoRelease
French AI startup Mistral AI officially released its highly anticipated open-source Mixture of Experts (MoE) model — Mixtral 8x7B. The model caused a sensation…

Latest in AI

Unsloth releases GGUF version of Cohere North-Mini-Code 1.0 (30B A3B MoE) on Hugging Face

JetBrains Mellum 2: a really good and performant model

Xiaomi Claims 1,000+ TPS on a 1T Model Using a Standard 8-GPU Server★ 72

Introducing Mistral 3★ 84

Introducing Mistral Small 4★ 76

Introducing Mistral 3★ 78

Introducing Mistral Small 4★ 78

Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?

Thinking Machines 推出原生互動模型 TML-Interaction-Small 276B-A12B：突破即時語音 SOTA 並淘汰傳統 VAD★ 85

Transformer 中的混合專家模型 (MoE) 技術解析：原理、優缺點與實作挑戰★ 82

中國開源 AI 生態系的架構抉擇：超越 DeepSeek 的下一步★ 85

「DeepSeek 時刻」一週年：開源 AI 的典範轉移與變革回顧★ 85

歡迎來到 Falcon 3 開源模型家族！TII 推出全新輕量與 MoE 模型架構★ 80

在 Replicate 上透過 API 輕鬆運行 Snowflake Arctic 開源大模型

SegMoE：Segmind 推出擴散模型混合專家（Mixture of Diffusion Experts）框架★ 75

2023 年：開源大語言模型（Open LLMs）爆發之年★ 75

混合專家模型 (Mixture of Experts, MoE) 技術詳解★ 85

歡迎 Mixtral：Hugging Face 迎來頂尖的混合專家（MoE）開源模型★ 90