Latest in AI

Showing:ResearchersOpen-sourceClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Xiaomi Claims 1,000+ TPS on a 1T Model Using a Standard 8-GPU Server★ 72
r/LocalLLaMA top day50 days agoBenchmark
Xiaomi announced MiMo-V2.5-Pro-UltraSpeed with TileRT, claiming over 1,000 tokens/s decode speed on a 1-trillion-parameter MoE model. The company says it runs on a single standard 8-GPU commodity node, not wafer-scale or SRAM-heavy specialized hardware. The claimed stack combines FP4 MoE expert quantization, DFlash speculative decoding, and TileRT low-latency inference kernels, but independent validation is still needed.
Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax★ 72
r/LocalLLaMA top day50 days agoNew Tool
Luce Spark is an open-source MoE offload system for running 33B-35B A3B models on 16GB-class GPUs. It keeps frequently routed experts on GPU, stores the long tail in system RAM, and swaps cold experts through a bounded async cache. The author reports 13.3 GiB for Qwen3.6 35B-A3B and about 100 tok/s with Spark optimizations, but notes real 16GB GPU testing is still missing.
OpenEnv coordination expands to HF, PyTorch, Unsloth, Modal, and more
r/LocalLLaMA top day50 days agoNew Tool
OpenEnv is a tool for creating agentic execution environments such as terminals, browsers, or other systems an agent can interact with. The project will now be coordinated by a committee including Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face. The post also lists many AI organizations supporting or adopting OpenEnv, positioning it as infrastructure for open-source agent training.
[3090] Gemma4 QAT + MTP quick TPS numbers
r/LocalLLaMA top day50 days agoBenchmark
A r/LocalLLaMA user shared quick throughput numbers for Gemma4 QAT with MTP speculative decoding on an RTX 3090 24GB setup. They report roughly 1.2-1.8x TPS improvement, with Gemma 4 31B moving from about 40 tok/s to 70-80 tok/s. The author frames this as a rough benchmark, using 11 task categories and noting stochastic variation from temp 1.0.
mtmd adds video input support in llama.cpp★ 72
r/LocalLLaMA top day50 days agoRelease
ggml-org/llama.cpp merged PR #24269, adding video input support to mtmd through mtmd-cli and /chat/completions, which also enables the web UI path. The implementation invokes a locally installed ffmpeg subprocess instead of bundling codec support, and currently extracts visual frames only, with no audio support yet. It was tested with Qwen3-VL-2B in CLI and Gemma 4 E4B in web UI, making local multimodal video experiments more accessible.
What was your local daily driver for coding last week?
r/LocalLLaMA top day50 days agoCommentary
This r/LocalLLaMA post is a brief community poll asking users what their local coding daily driver was last week. The post asks commenters to share their favorite model and quant, but the provided text does not include poll options, results, or specific model names. Its value is mainly as a community signal for tracking local LLM coding preferences.
llama.cpp PR #24277 avoids KV cell copies in kv-cache
r/LocalLLaMA top day50 days agoRelease
ggml-org/llama.cpp merged PR #24277 by ggerganov, titled “kv-cache: avoid kv cells copies.” The Reddit post says the change improves MTP performance for Gemma-4 and was merged the previous day. It is available starting with the b9551 release, making it relevant for local inference users tracking llama.cpp performance updates.
Magistral★ 78
Mistral AI News50 days agoRelease
Mistral AI announced Magistral, its first reasoning model family, with Magistral Small as a 24B open-weight Apache 2.0 model and Magistral Medium for enterprise use. The company emphasizes traceable multilingual reasoning, professional-domain use cases, and faster reasoning in Le Chat through Think mode and Flash Answers. Magistral Small is available on Hugging Face, while Magistral Medium is available in Le Chat preview and via La Plateforme API.
Upgrading agentic coding capabilities with the new Devstral models★ 72
Mistral AI News50 days agoRelease
Mistral AI announced two Devstral updates focused on agentic coding workflows: Devstral Small 1.1 and Devstral Medium. Devstral Small 1.1 remains a 24B Apache 2.0 open model and reaches 53.6% on SWE-Bench Verified. Devstral Medium reaches 61.6%, is available through Mistral’s API, and supports private deployment and custom finetuning for enterprises.
Voxtral★ 78
Mistral AI News50 days agoRelease
Mistral AI introduces Voxtral, a speech understanding model family with 24B and 3B variants under Apache 2.0. The models support long-context transcription, audio Q&A, summarization, multilingual detection, and function calling from voice. Mistral says Voxtral is competitive across transcription and audio understanding benchmarks, with API access starting at $0.001 per minute and local downloads available on Hugging Face.
Introducing Mistral 3★ 84
Mistral AI News50 days agoRelease
Mistral AI introduced Mistral 3, a new open model family under Apache 2.0. It includes Mistral Large 3, a 675B-parameter sparse MoE with 41B active parameters, plus Ministral 3 models at 3B, 8B, and 14B. The release targets frontier open-weight use, multimodal and multilingual workflows, enterprise customization, and efficient local or edge deployments.
Introducing Devstral 2 and Mistral Vibe CLI★ 76
Mistral AI News50 days agoNew Tool
Mistral introduced Devstral 2, a 123B coding model, and Devstral Small 2, a 24B variant for lighter deployment. The company reports 72.2% and 68.0% on SWE-bench Verified, respectively, with permissive open-source licensing. It also launched Mistral Vibe CLI, an open-source terminal agent for codebase exploration, multi-file edits, command execution, and IDE integration.
Heaps do lie: debugging a memory leak in vLLM
Mistral AI News50 days agoTutorial
Mistral AI published an engineering deep dive on a memory leak found during vLLM disaggregated serving tests. The leak appeared only with a specific stack involving Mistral Medium 3.1, NIXL, UCX, graph compilation, and P/D disaggregation, with RSS growing steadily despite heap profilers looking normal. The team used pmap, BPFtrace, and targeted GDB automation to trace the issue to UCX mmap hooks and applied configuration fixes plus a vLLM patch.
Leanstral: Open-Source Foundation for Trustworthy Vibe-Coding★ 76
Mistral AI News50 days agoRelease
Mistral AI introduced Leanstral, an open-source code agent designed for Lean 4 and formal proof engineering. The model is available through Apache 2.0 weights, Mistral Vibe, and a Labs API endpoint. Mistral positions it as a cost-efficient alternative for verified coding workflows, with FLTEval benchmarks comparing it against Claude family models and large open-source competitors.
Mistral AI partners with NVIDIA to accelerate open frontier models★ 74
Mistral AI News50 days agoBusiness
Mistral AI announced it is a founding member of the NVIDIA Nemotron Coalition, a global initiative for open frontier foundation models. The partnership combines Mistral AI’s model architecture, training techniques, multimodal capabilities, and enterprise fine-tuning tools with NVIDIA compute, development tools, and synthetic data pipelines. The coalition’s first initiative is a DGX Cloud-trained base model that will support the upcoming NVIDIA Nemotron 4 family and be open-sourced for specialization.
Introducing Mistral Small 4★ 76
Mistral AI News50 days agoRelease
Mistral AI introduced Mistral Small 4 as the next major release in the Mistral Small family. It combines reasoning, multimodal, and agentic coding capabilities into one open model with configurable reasoning effort. The model uses a MoE architecture, supports a 256k context window and text-image inputs, and is available through Mistral API, AI Studio, Hugging Face, NVIDIA NIM, and common inference stacks.
Voxtral TTS: Open-Weights, Low-Latency Text-to-Speech from Mistral AI★ 78
Mistral AI News50 days agoRelease
Mistral AI introduced Voxtral TTS, its first text-to-speech model, focused on realistic multilingual voice generation. The 4B-parameter model supports nine languages, quick voice adaptation from short references, and low-latency streaming for voice agents. Mistral says human evaluations show stronger naturalness than ElevenLabs Flash v2.5, with API access, Studio testing, Le Chat access, and open weights on Hugging Face.
Voxtral TTS★ 76
Mistral AI News50 days agoRelease
Mistral AI introduced Voxtral TTS, its first text-to-speech model, targeting natural multilingual voice generation across nine languages. The 4B-parameter model supports voice adaptation from short references, emotional expressiveness, dialect handling, and low-latency streaming. It is available through API, Mistral Studio, and Le Chat, with open weights on Hugging Face under a non-commercial CC BY NC 4.0 license.
Introducing Mistral 3★ 78
Mistral AI News50 days agoRelease
Mistral AI introduced Mistral 3, a new open model family including Mistral Large 3 and Ministral 3 models at 3B, 8B, and 14B sizes. Large 3 is a 675B-parameter sparse MoE model with 41B active parameters, while Ministral 3 targets local and edge use cases. The models are released under Apache 2.0 and are available through Mistral AI Studio, Hugging Face, Amazon Bedrock, and other platforms.
Introducing Mistral Small 4★ 78
Mistral AI News50 days agoRelease
Mistral Small 4 is the next major release in the Mistral Small family, unifying Magistral-style reasoning, Pixtral-style multimodality, and Devstral-style coding agents. It uses a MoE architecture with 119B total parameters, 6B active parameters per token, a 256k context window, and configurable reasoning effort. The model is available via Mistral API, AI Studio, Hugging Face, open-source serving stacks, and NVIDIA deployment options.
VAST Raises Nearly $200M and Reveals Its Project Eden World Model Roadmap★ 74
量子位 QbitAI50 days agoBusiness
VAST completed nearly $200 million in A+ and A++ financing after its March 2026 Series A. The company also unveiled Project Eden, a world model approach that separates persistent state transition from generative visual rendering. The roadmap targets persistent virtual environments, multiplayer interaction, reusable scenes, AI-native sandbox creation, and embodied AI simulation, while acknowledging unresolved challenges in complex physics and autonomous state maintenance.
World’s First Robot Training Property: 300,000 Chinese Homes for Robots
量子位 QbitAI50 days agoRelease
Daxiao Robot and CUHK MMLab introduced Kairos-Homeworld, an open project with 300,000 Chinese residential floor plans and 5,000 interactive 3D home scenes. It can generate full household environments from prompts, including layouts, furniture, objects, and physical properties. The article frames it alongside Kairos 3.0-4B as part of a broader embodied AI stack: world model, data, and environment.
Huawei Cloud Launches Agentic AI Products for Enterprise AI Infrastructure★ 72
量子位 QbitAI50 days agoRelease
Huawei Cloud announced an Agentic Infra framework at its INSPIRE event, covering token generation, persistent memory, unified scheduling, and secure autonomous runtime. The release includes AICS, AMS, CCE Volcano Next, AgentSphere, ModelArts Next, AgentArts, and the open-source openJiuwen project. It also introduced industry AI zones, CloudRobo for embodied AI, security offerings, and an ecosystem plan with major Chinese model vendors.
CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76
量子位 QbitAI50 days agoPaper
CVPR 2026 named Google DeepMind’s D4RT as Best Paper for fast dynamic 4D scene reconstruction from video. Honorable mentions included Meta’s SAM 3D and NVIDIA’s NitroGen, while TRELLIS.2 won Best Student Paper. The article emphasizes Chinese researcher visibility, ResNet and YOLO receiving the Longuet-Higgins Prize, and a GDUT-led undergraduate-heavy ChordEdit team breaking through among major labs and elite universities.
JoyAI-Echo open-source framework targets stable 5-minute AI long videos★ 72
量子位 QbitAI50 days agoNew Tool
QbitAI reports that JD’s team has open-sourced JoyAI-Echo, a long audio-video generation framework for multi-minute AI videos. It targets character drift, unstable voice, slow inference, and blurry output through cross-modal memory, memory-driven post-training, and lightweight real-time super-resolution. The system also includes a Director Agent for script planning, shot-level generation, localized edits, and iterative video production.
Thoughts on Gemma4 12B vs 26A4B: Which Is Better?
r/LocalLLaMA top day50 days agoOpinion
The post asks the LocalLLaMA community to compare Gemma4 12B and 26A4B, explicitly excluding the 31B model from discussion. The user is mainly interested in creative tasks, writing, and chatting, with coding treated as optional rather than central. No benchmarks or examples are provided, so the post is best read as a model-selection question about subjective quality and practical use.
Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75
r/LocalLLaMA top day50 days agoBenchmark
A Reddit user shared benchmark results showing Google's Gemma 4 31B (FP8) performing on par with Claude Sonnet 4.6 Medium. The custom evaluation harness tested complex tasks including Neo4j Cypher queries, entity extraction, agentic tool calling, Python coding, and multi-vector retrieval synthesis. This highlights how quantized mid-sized open-source models are closing the gap with leading proprietary frontier models.
Best Local TTS Solution
r/LocalLLaMA top day50 days agoCommentary
A r/LocalLLaMA user says they have tested many local TTS tools, but none match ElevenLabs for expressiveness, voices, and cloning. They list moss-nano and Kokoro as the best edge-device candidates so far, with edgeTTS as a free/cloud option. The post asks for community experience connecting agents such as Hermes, openclaw, or opencode to Telegram voice notes or real-time voice conversations.
A Matter Wi-Fi Light Bulb in Rust on the Raspberry Pi Pico 2 W
Hacker News (AI keywords)50 days agoHardware
This GitHub repository collects Rust Embassy examples for Raspberry Pi Pico 2 and Pico 2 W. Its Matter Wi-Fi light example uses rs-matter, BLE commissioning, and Wi-Fi connectivity so the board can appear as a standard smart bulb in Home Assistant, Apple Home, or Google Home. The project is mainly relevant to embedded Rust and smart-home developers, not AI model users.
The Open Source Community is backing OpenEnv for Agentic RL
Hugging Face Blog50 days agoCommentary
The title indicates that OpenEnv is being positioned around agentic reinforcement learning. The confirmed signal is community support from the open-source ecosystem, not specific technical claims. Without the full article, details such as contributors, features, integrations, benchmarks, or adoption status should be treated as unknown.

← PreviousPage 4Next →

Latest in AI

Xiaomi Claims 1,000+ TPS on a 1T Model Using a Standard 8-GPU Server★ 72

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax★ 72

OpenEnv coordination expands to HF, PyTorch, Unsloth, Modal, and more

[3090] Gemma4 QAT + MTP quick TPS numbers

mtmd adds video input support in llama.cpp★ 72

What was your local daily driver for coding last week?

llama.cpp PR #24277 avoids KV cell copies in kv-cache

Magistral★ 78

Upgrading agentic coding capabilities with the new Devstral models★ 72

Voxtral★ 78

Introducing Mistral 3★ 84

Introducing Devstral 2 and Mistral Vibe CLI★ 76

Heaps do lie: debugging a memory leak in vLLM

Leanstral: Open-Source Foundation for Trustworthy Vibe-Coding★ 76

Mistral AI partners with NVIDIA to accelerate open frontier models★ 74

Introducing Mistral Small 4★ 76

Voxtral TTS: Open-Weights, Low-Latency Text-to-Speech from Mistral AI★ 78

Voxtral TTS★ 76

Introducing Mistral 3★ 78

Introducing Mistral Small 4★ 78

VAST Raises Nearly $200M and Reveals Its Project Eden World Model Roadmap★ 74

World’s First Robot Training Property: 300,000 Chinese Homes for Robots

Huawei Cloud Launches Agentic AI Products for Enterprise AI Infrastructure★ 72

CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76

JoyAI-Echo open-source framework targets stable 5-minute AI long videos★ 72

Thoughts on Gemma4 12B vs 26A4B: Which Is Better?

Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75

Best Local TTS Solution

A Matter Wi-Fi Light Bulb in Rust on the Raspberry Pi Pico 2 W

The Open Source Community is backing OpenEnv for Agentic RL