Latest in AI

Showing:Open-sourceClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Introducing Mistral Small 4★ 76
Mistral AI News50 days agoRelease
Mistral AI introduced Mistral Small 4 as the next major release in the Mistral Small family. It combines reasoning, multimodal, and agentic coding capabilities into one open model with configurable reasoning effort. The model uses a MoE architecture, supports a 256k context window and text-image inputs, and is available through Mistral API, AI Studio, Hugging Face, NVIDIA NIM, and common inference stacks.
Voxtral TTS: Open-Weights, Low-Latency Text-to-Speech from Mistral AI★ 78
Mistral AI News50 days agoRelease
Mistral AI introduced Voxtral TTS, its first text-to-speech model, focused on realistic multilingual voice generation. The 4B-parameter model supports nine languages, quick voice adaptation from short references, and low-latency streaming for voice agents. Mistral says human evaluations show stronger naturalness than ElevenLabs Flash v2.5, with API access, Studio testing, Le Chat access, and open weights on Hugging Face.
Remote agents in Vibe, powered by Mistral Medium 3.5★ 78
Mistral AI News50 days agoNew Tool
Mistral Medium 3.5 is a 128B dense model in public preview, combining instruction-following, reasoning, and coding with a 256k context window. It becomes the default model for Le Chat and Mistral Vibe. Vibe now supports remote coding agents that run asynchronously in the cloud, while Le Chat adds Work mode for longer multi-step tasks across connected tools.
Voxtral TTS★ 76
Mistral AI News50 days agoRelease
Mistral AI introduced Voxtral TTS, its first text-to-speech model, targeting natural multilingual voice generation across nine languages. The 4B-parameter model supports voice adaptation from short references, emotional expressiveness, dialect handling, and low-latency streaming. It is available through API, Mistral Studio, and Le Chat, with open weights on Hugging Face under a non-commercial CC BY NC 4.0 license.
Introducing Mistral 3★ 78
Mistral AI News50 days agoRelease
Mistral AI introduced Mistral 3, a new open model family including Mistral Large 3 and Ministral 3 models at 3B, 8B, and 14B sizes. Large 3 is a 675B-parameter sparse MoE model with 41B active parameters, while Ministral 3 targets local and edge use cases. The models are released under Apache 2.0 and are available through Mistral AI Studio, Hugging Face, Amazon Bedrock, and other platforms.
Introducing Mistral Small 4★ 78
Mistral AI News50 days agoRelease
Mistral Small 4 is the next major release in the Mistral Small family, unifying Magistral-style reasoning, Pixtral-style multimodality, and Devstral-style coding agents. It uses a MoE architecture with 119B total parameters, 6B active parameters per token, a 256k context window, and configurable reasoning effort. The model is available via Mistral API, AI Studio, Hugging Face, open-source serving stacks, and NVIDIA deployment options.
VAST Raises Nearly $200M and Reveals Its Project Eden World Model Roadmap★ 74
量子位 QbitAI50 days agoBusiness
VAST completed nearly $200 million in A+ and A++ financing after its March 2026 Series A. The company also unveiled Project Eden, a world model approach that separates persistent state transition from generative visual rendering. The roadmap targets persistent virtual environments, multiplayer interaction, reusable scenes, AI-native sandbox creation, and embodied AI simulation, while acknowledging unresolved challenges in complex physics and autonomous state maintenance.
World’s First Robot Training Property: 300,000 Chinese Homes for Robots
量子位 QbitAI50 days agoRelease
Daxiao Robot and CUHK MMLab introduced Kairos-Homeworld, an open project with 300,000 Chinese residential floor plans and 5,000 interactive 3D home scenes. It can generate full household environments from prompts, including layouts, furniture, objects, and physical properties. The article frames it alongside Kairos 3.0-4B as part of a broader embodied AI stack: world model, data, and environment.
Huawei Cloud Launches Agentic AI Products for Enterprise AI Infrastructure★ 72
量子位 QbitAI50 days agoRelease
Huawei Cloud announced an Agentic Infra framework at its INSPIRE event, covering token generation, persistent memory, unified scheduling, and secure autonomous runtime. The release includes AICS, AMS, CCE Volcano Next, AgentSphere, ModelArts Next, AgentArts, and the open-source openJiuwen project. It also introduced industry AI zones, CloudRobo for embodied AI, security offerings, and an ecosystem plan with major Chinese model vendors.
CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76
量子位 QbitAI50 days agoPaper
CVPR 2026 named Google DeepMind’s D4RT as Best Paper for fast dynamic 4D scene reconstruction from video. Honorable mentions included Meta’s SAM 3D and NVIDIA’s NitroGen, while TRELLIS.2 won Best Student Paper. The article emphasizes Chinese researcher visibility, ResNet and YOLO receiving the Longuet-Higgins Prize, and a GDUT-led undergraduate-heavy ChordEdit team breaking through among major labs and elite universities.
JoyAI-Echo open-source framework targets stable 5-minute AI long videos★ 72
量子位 QbitAI50 days agoNew Tool
QbitAI reports that JD’s team has open-sourced JoyAI-Echo, a long audio-video generation framework for multi-minute AI videos. It targets character drift, unstable voice, slow inference, and blurry output through cross-modal memory, memory-driven post-training, and lightweight real-time super-resolution. The system also includes a Director Agent for script planning, shot-level generation, localized edits, and iterative video production.
How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies★ 72
NVIDIA Blog50 days agoBusiness
NVIDIA says the UK’s “AI maker” strategy is moving into deployment through domestic AI cloud infrastructure, Isambard-AI, and the Sovereign AI Fund. UK startups are using NVIDIA technologies for coding agents, self-improving AI, inference optimization, and biological foundation models. The post also covers NVIDIA’s UK startup investment, developer training, 6G collaboration, and enterprise AI projects moving from pilots into production.
Thoughts on Gemma4 12B vs 26A4B: Which Is Better?
r/LocalLLaMA top day50 days agoOpinion
The post asks the LocalLLaMA community to compare Gemma4 12B and 26A4B, explicitly excluding the 31B model from discussion. The user is mainly interested in creative tasks, writing, and chatting, with coding treated as optional rather than central. No benchmarks or examples are provided, so the post is best read as a model-selection question about subjective quality and practical use.
Community Discussion: Local Installation and Multilingual Training for Kokoro TTS
r/LocalLLaMA top day50 days agoCommentary
A LocalLLaMA subreddit post discusses challenges with Kokoro TTS's multilingual performance on cloud APIs. The author is seeking community advice on how to install Kokoro locally and train/fine-tune it for Brazilian Portuguese to achieve more natural-sounding speech.
Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75
r/LocalLLaMA top day50 days agoBenchmark
A Reddit user shared benchmark results showing Google's Gemma 4 31B (FP8) performing on par with Claude Sonnet 4.6 Medium. The custom evaluation harness tested complex tasks including Neo4j Cypher queries, entity extraction, agentic tool calling, Python coding, and multi-vector retrieval synthesis. This highlights how quantized mid-sized open-source models are closing the gap with leading proprietary frontier models.
NVIDIA and LG Group Build an AI Factory for Physical AI, Mobility and AI Infrastructure★ 74
NVIDIA Blog50 days agoHardware
NVIDIA and LG Group are collaborating on an AI factory to support LG’s AI-driven businesses across robotics, autonomous driving, data center technologies and GPU cloud services. The effort connects NVIDIA’s AI factory platform with LG’s manufacturing, mobility, robotics and infrastructure capabilities. It also covers Isaac, Cosmos, DRIVE, DSX and EXAONE-related work using Blackwell GPUs, NeMo, Nemotron datasets and TensorRT-LLM.
Best Local TTS Solution
r/LocalLLaMA top day50 days agoCommentary
A r/LocalLLaMA user says they have tested many local TTS tools, but none match ElevenLabs for expressiveness, voices, and cloning. They list moss-nano and Kokoro as the best edge-device candidates so far, with edgeTTS as a free/cloud option. The post asks for community experience connecting agents such as Hermes, openclaw, or opencode to Telegram voice notes or real-time voice conversations.
A Matter Wi-Fi Light Bulb in Rust on the Raspberry Pi Pico 2 W
Hacker News (AI keywords)50 days agoHardware
This GitHub repository collects Rust Embassy examples for Raspberry Pi Pico 2 and Pico 2 W. Its Matter Wi-Fi light example uses rs-matter, BLE commissioning, and Wi-Fi connectivity so the board can appear as a standard smart bulb in Home Assistant, Apple Home, or Google Home. The project is mainly relevant to embedded Rust and smart-home developers, not AI model users.
User Shares Gemma 4 QAT Experience: Improved Quality and MTP Speedups
r/LocalLLaMA top day50 days agoOpinion
A Reddit user shared their experience with the Gemma 4 31B QAT (Quantization-Aware Training) model. Compared to traditional GGUF quants like Q6_K_L, the QAT version delivers noticeable quality improvements in roleplay and long-context tasks. Additionally, combining the QAT model with Multi-Token Prediction (MTP) yielded massive speedups, boosting generation speeds from ~20 t/s to up to 50 t/s.
The Open Source Community is backing OpenEnv for Agentic RL
Hugging Face Blog50 days agoCommentary
The title indicates that OpenEnv is being positioned around agentic reinforcement learning. The confirmed signal is community support from the open-source ecosystem, not specific technical claims. Without the full article, details such as contributors, features, integrations, benchmarks, or adoption status should be treated as unknown.
"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts
r/LocalLLaMA top day50 days agoCommentary
A popular Reddit post highlights a video demonstrating a "Fully Hallucinated Operating System" run entirely inside an LLM. By prompting the model to act as a terminal, it simulates file systems, network requests, and command execution purely through text generation. While impractical for production, this experiment showcases the impressive state-tracking and "world model" capabilities of modern LLMs.
llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM
r/LocalLLaMA top day50 days agoCommentary
A Reddit user highlighted a limitation in llama-server's router mode (`--models-preset`): child processes spawn and initialize CUDA contexts on all available GPUs, even when pinned to a single card. When other GPUs are fully utilized by a large model, launching a smaller model fails with a CUDA OOM error because it cannot allocate the context stub on the maxed-out cards. Currently, child processes inherit the base environment, preventing per-model `CUDA_VISIBLE_DEVICES` configuration.
Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?
r/LocalLLaMA top day50 days agoCommentary
A popular Reddit thread on r/LocalLLaMA discusses the potential of 2-bit Quantization Aware Training (QAT) for large MoE models (120B to 400B). While current QAT efforts focus on 4-bit, users speculate whether a 2-bit QAT model could fit into consumer hardware (64GB/128GB RAM) and outperform a 4-bit model of half its size. This approach is proposed as a practical alternative to training ternary (1.58-bit) LLMs from scratch.
NVFP4 Support Merged in llama.cpp: How to Use 4-bit Blackwell Quantization
r/LocalLLaMA top day50 days agoCommentary
Following the merge of native NVFP4 (NVIDIA FP4) support in llama.cpp, users are exploring how to leverage this format on Blackwell GPUs (such as the RTX 50-series). The discussion focuses on converting NVFP4 safetensors (like Gemma 4 QAT) to GGUF format and whether importance matrices (imatrix) are required. This enablement promises significant performance gains for local LLM execution on next-gen hardware.
Gemma-4-26B-A4B QAT Variant Performs Poorly in llama.cpp Compared to Non-QAT Version
r/LocalLLaMA top day50 days agoBenchmark
A LocalLLaMA user highlighted that the newly released QAT (Quantization-Aware Training) variant of Google's Gemma-4-26B-A4B model underperforms compared to its non-QAT predecessor. Testing via llama.cpp on a chessboard SVG generation task showed significant rendering errors in the QAT version. The non-QAT GGUF version, however, produced highly accurate results under identical settings.
Office-open-xml-viewer: Office XML document viewer rendering to HTML Canvas
Hacker News (AI keywords)50 days agoNew Tool
office-open-xml-viewer is an open-source browser viewer for Office Open XML documents, rendering DOCX, XLSX, and PPTX files to HTML Canvas. Its parsers are written in Rust and compiled to WebAssembly, while rendering uses the Canvas 2D API. The README also says the full codebase was implemented by Claude through iterative prompting, making it notable as an AI-assisted software development case.
Control 3D Avatars with Natural Language Using "Program as Weights" (programasweights)
r/LocalLLaMA top day51 days agoNew Tool
Developer Yuntian Deng introduced "programasweights," a framework that compiles plain-English descriptions into tiny, local action programs (loops, parallel tracks) to control 3D avatars. Instead of pre-defined buttons, users can command complex sequences like "wave while walking, then jump." The runtime code is open-source and runs entirely offline in the browser or via Python.
GMKtec Announces EVO-X3 Mini PC, Teases 192GB Ryzen AI MAX+ 495 "Strix Halo" Monster★ 78
r/LocalLLaMA top day51 days agoHardware
GMKtec has announced its EVO-X3 mini PC with upgraded I/O, including OCuLink and Wi-Fi 7. More importantly for local AI enthusiasts, the company teased a future model powered by AMD's flagship "Strix Halo" Ryzen AI MAX+ 495 APU. This upcoming monster will support up to 192GB of LPDDR5X memory, offering a highly anticipated, cost-effective alternative to Apple Silicon for running large local LLMs.
Managing Multiple MCP Servers: How to Prevent Context Pollution and Token Waste
r/LocalLLaMA top day51 days agoCommentary
A popular Reddit thread on r/LocalLLaMA addresses the challenge of loading multiple Model Context Protocol (MCP) servers at startup, which floods the context window with tool definitions. Users are discussing potential solutions, including using MCP proxies/hubs to route requests through a single endpoint or implementing lazy-loading. This highlights a growing need for better orchestration tools as the local MCP ecosystem expands.
llama.cpp Gemma4 MTP Support Merged
r/LocalLLaMA top day51 days agoRelease
llama.cpp PR #23398 was merged on June 7, 2026, adding MTP support for Gemma4 models. The author reports over 2x average speedup on dense models, no observed speedup on MoE, and replicated AIME-26 results around 87%. Support currently covers 31B and 26B-4B variants, while E4B and E2B are not supported yet; multi-GPU may need extra draft-device configuration.

← PreviousPage 5Next →

Latest in AI

Introducing Mistral Small 4★ 76

Voxtral TTS: Open-Weights, Low-Latency Text-to-Speech from Mistral AI★ 78

Remote agents in Vibe, powered by Mistral Medium 3.5★ 78

Voxtral TTS★ 76

Introducing Mistral 3★ 78

Introducing Mistral Small 4★ 78

VAST Raises Nearly $200M and Reveals Its Project Eden World Model Roadmap★ 74

World’s First Robot Training Property: 300,000 Chinese Homes for Robots

Huawei Cloud Launches Agentic AI Products for Enterprise AI Infrastructure★ 72

CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76

JoyAI-Echo open-source framework targets stable 5-minute AI long videos★ 72

How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies★ 72

Thoughts on Gemma4 12B vs 26A4B: Which Is Better?

Community Discussion: Local Installation and Multilingual Training for Kokoro TTS

Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75

NVIDIA and LG Group Build an AI Factory for Physical AI, Mobility and AI Infrastructure★ 74

Best Local TTS Solution

A Matter Wi-Fi Light Bulb in Rust on the Raspberry Pi Pico 2 W

User Shares Gemma 4 QAT Experience: Improved Quality and MTP Speedups

The Open Source Community is backing OpenEnv for Agentic RL

"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts

llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM

Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?

NVFP4 Support Merged in llama.cpp: How to Use 4-bit Blackwell Quantization

Gemma-4-26B-A4B QAT Variant Performs Poorly in llama.cpp Compared to Non-QAT Version

Office-open-xml-viewer: Office XML document viewer rendering to HTML Canvas

Control 3D Avatars with Natural Language Using "Program as Weights" (programasweights)

GMKtec Announces EVO-X3 Mini PC, Teases 192GB Ryzen AI MAX+ 495 "Strix Halo" Monster★ 78

Managing Multiple MCP Servers: How to Prevent Context Pollution and Token Waste

llama.cpp Gemma4 MTP Support Merged