Latest in AI

Showing:Open-sourceClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84
Latent Space48 days agoRelease
Anthropic released Claude Fable 5 as its first broadly available Mythos-class model, alongside restricted Mythos 5 access. Benchmarks and ecosystem reports show strong gains in coding, long-horizon agentic tasks, research, and vision. The controversy centers on 30-day retention for Mythos-class traffic and silent interventions that may reduce effectiveness on frontier LLM development tasks, raising trust, reproducibility, and open AI concerns.
Without open LLM competition, closed-source LLM companies will become insatiable
r/LocalLLaMA top day48 days agoOpinion
A r/LocalLLaMA user criticizes closed-source LLM providers, singling out Anthropic and its $200/month users. The post argues that without open-source model competition, proprietary AI companies could become more arrogant and less accountable to customers. The source offers little concrete context beyond an image and opinionated commentary, so it is best read as a community sentiment post rather than a verified product incident.
Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) Optimized for Agentic Verification + AgentHarness Evals
r/LocalLLaMA top day48 days agoRelease
Apodex 1.0 launches with open-weight models at 0.8B, 2B, and 4B, trained not for general generation but for specialized sub-agent roles—fact-checking external claims and verifying tool call outputs before passing results to a main controller. The design targets long-horizon agent workflows where routing small tasks to lightweight models avoids wasteful use of 70B+ models at every step. AgentHarness, an open-source evaluation framework for local multi-step agent pipelines, is released alongside the weights.
Furiosa AI inference chip could be a game changer for local LLMs
r/LocalLLaMA top day48 days agoHardware
A r/LocalLLaMA post discusses Furiosa AI’s RNGD inference chip, citing TSMC 5nm, Hynix HBM3, 48GB VRAM, 1.5TB/s bandwidth, and 180W TDP. The author argues it could matter for local LLM users if Furiosa opens its programming interface and works with llama.cpp on a GGML backend. The post later clarifies Furiosa is not selling to consumers; this is a wish and market commentary, not a launch.
Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech
Hugging Face Blog48 days agoBenchmark
Code-switching—where bilingual speakers blend two languages in a single utterance—is common in markets like Taiwan, Singapore, and India, yet most ASR benchmarks focus on monolingual audio. ServiceNow AI evaluates frontier speech recognition models specifically on this mixed-language scenario. The findings help enterprise teams make informed ASR model choices when deploying voice agents for multilingual customer-facing applications.
OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
r/LocalLLaMA top day48 days agoPaper
OSCAR applies offline-precomputed rotation matrices—derived from spectral covariance analysis—to reshape KV tensor distributions before 2-bit quantization, suppressing outliers and reducing rounding error. The rotation adds negligible inference overhead since it requires no runtime learning. GGUF downloads for Gemma-4-12B-it, Qwen3-32B, and Qwen3-4B-Thinking are available, with llama.cpp and sglang integrations and an arXiv paper.
SCAIL-2: Open-Source End-to-End Character Animation Without Intermediate Pose Representations
r/LocalLLaMA top day48 days agoRelease
SCAIL-2 by zai-org removes the reliance on skeleton maps and inpainting masks common in prior character animation pipelines, driving characters directly from video in an end-to-end manner. Trained on 60K synthesized motion pairs using SCAIL-Preview, Wan-Animate, and MoCha via a Unified Motion Transfer Interface with RoPE design, the model develops emergent abilities beyond its teacher models. Capabilities include cross-identity character replacement, animal-driving scenarios, and zero-shot support for SAM3D-Body mesh rendering.
Releasing Cohere North Mini Code
r/LocalLLaMA top day48 days agoRelease
Cohere’s Jay Alammar announced the official release of North Mini Code after early community feedback from r/LocalLLaMA. Weights are available on Hugging Face, including an fp8 version, and the model can be tried for free through OpenCode. For vLLM deployment, Cohere recommends using vLLM main for now and installing cohere_melody for accurate response parsing, while noting community requests for quantization and llama.cpp support.
Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G
r/LocalLLaMA top day48 days agoBenchmark
A public HuggingFace Spaces dashboard hosts a live competition where AI agents race to optimize Gemma 4 E4B inference throughput on a single NVIDIA A10G GPU. The challenge gamifies ML inference engineering, letting anyone watch agents explore quantization and scheduling strategies in real time. Optimization recipes surfaced by the competition offer practical value for developers targeting single-GPU self-hosted Gemma 4 deployments.
Cohere North Mini Code 1.0
r/LocalLLaMA top day48 days agoRelease
CohereLabs’ North Mini Code 1.0 appears to have moved from early access to final release, with weights available on Hugging Face. The Reddit post describes it as a 30B A3B coding model. Its Artificial Analysis overall score of 28 trails Qwen 3.6 35B at 43, but its coding index score of 33 is close to Qwen’s 35 and above Gemma 4 26B’s 22.
Unsloth Gemma 4 QAT MTP assistant models now available
r/LocalLLaMA top day48 days agoRelease
A r/LocalLLaMA post notes that Unsloth’s Gemma 4 QAT MTP assistant models are now available in GGUF format. The root directories include q8_0 files named mtp-gemma-4-*.gguf, while MTP folders contain q8_0 and larger quantized variants. The listed releases cover 12B, 26B-A4B, 31B, E2B, E2B mobile, E4B, and E4B mobile it-qat-GGUF repositories.
TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)
r/LocalLLaMA top day48 days agoBenchmark
Reddit user UkieTechie has revamped their TTS benchmark platform with objective scoring standards and live blind voting, now covering 46 speech synthesis models. Hosted on Hugging Face Space, the arena lets users vote on audio quality without knowing the model name, generating a dynamic ELO leaderboard. The project is open-source on GitHub and welcomes community submissions of new models.
Single-slot half-height PCIe V100 with NVLink appears in China
r/LocalLLaMA top day49 days agoHardware
A r/LocalLLaMA post says a Bilibili creator has shown a single-slot, half-height PCIe V100 with NVLink on a custom PCB. The card is described as 16 cm long, passively cooled by default, capped at 75W, with another version supporting up to 300W. The 16GB model is expected around or below ¥1500, with a 32GB version reportedly planned, but it is not yet available for purchase.
Rick & Morty
r/LocalLLaMA top day49 days agoCommentary
This r/LocalLLaMA top-day post is a short image meme titled “Rick & Morty.” The only accompanying text says, “nobody expected HF there,” suggesting surprise at HF appearing in the image’s context. There are no technical claims, model details, releases, or benchmarks, so its value is mainly as a small signal of community culture around Hugging Face / HF and local LLM discussions.
Google Introduces Gemma 4 12B: A Unified, Encoder-Free Multimodal Model★ 85
Google DeepMind Blog49 days agoRelease
Google DeepMind has unveiled Gemma 4 12B, a next-generation open-weights model featuring a unified, encoder-free multimodal architecture. By eliminating the traditional separate vision encoder (such as ViT), it processes diverse modalities directly within a single Transformer network. This design simplifies training, reduces inference latency, and enhances cross-modal alignment, marking a significant milestone for open-source AI.
Apple Announced a New On-Device Inference Engine for Apple Silicon
r/LocalLLaMA top day49 days agoRelease
Apple announced CoreAI at WWDC, which the post frames as a possible future replacement for CoreML and an alternative to MLX, llama.cpp, and torch for optimized on-device inference. Models still need conversion through Python scripts, and current supported models appear mostly from mid-2025. No performance data is available yet; the author expects it may trail MLX on GPU, but Apple’s 20B on-device foundation model claim suggests larger app-bundled models could become possible.
Jetson Orin NX Build for Hermes Agent + Benchmarking
r/LocalLLaMA top day49 days agoHardware
The post describes turning an unused Jetson Orin NX into a compact local LLM server for Hermes Agent testing. The goals were low noise, over 10 tok/s generation, 300 tok/s prompt processing, at least 65K context, and a custom case. After testing Gemma 4, Qwen 3.6, and many quant variants, the author reports Gemma 4 26B A4B UD Q2_K_XL reaching 66K context and 10.21 tok/s near 60K context.
How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces★ 72
Hugging Face Blog49 days agoTutorial
This Hugging Face blog post demonstrates how AI agents can use Spaces as modular tools. By chaining an image generation Space with a 3D rendering Space, an agent automatically generated art assets and placed them inside a virtual 3D gallery. This highlights the power of Hugging Face's ecosystem, where any Space can serve as an API for agentic workflows.
TinySearch v0.2.0: Lightweight Open Web-Search Tool for Local LLMs Now Defaults to SearXNG
r/LocalLLaMA top day49 days agoRelease
TinySearch is a lightweight open-source MCP/FastAPI tool that crawls, chunks, and reranks web results into an 8k-token context blob for small local LLMs. Version 0.2.0 replaces DuckDuckGo with SearXNG as the default backend after DDG began rate-limiting and CAPTCHAing automated requests. Users can point it at a self-hosted SearXNG instance; it integrates with Cline, Roo, and OpenCode agent setups.
NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain
Hugging Face Blog49 days agoNew Tool
NeuroBait is a Hugging Face community project built to help with ADHD task-initiation freeze rather than diagnosis or to-do planning. It fine-tunes google/gemma-3-12b-it with LoRA to produce short, warm, context-aware nudges. The project uses Unsloth and Modal for training, then deploys on a Hugging Face Space with Gradio, transformers, peft, and a runtime LoRA adapter.
ByteDance Open-Sources Bernini, a Unified Framework for AI Video Editing★ 74
量子位 QbitAI49 days agoRelease
ByteDance’s commercial technology team has open-sourced Bernini, a unified framework for AI video generation and editing. Its design separates semantic planning from visual rendering: an MLLM-based planner understands text, source videos, images, and video references, then a DiT-based renderer produces the final video. The released Bernini-R includes inference code and weights, while the full planner-enabled version is still being prepared.
A 4B Edge-Deployable Cognitive Model Built in China
量子位 QbitAI49 days agoRelease
QbitAI’s headline says a domestic Chinese team has built a 4B-parameter “cognitive model” suitable for edge deployment. The framing links it to a model direction previously associated with Andrej Karpathy. Since the article body was not provided, details such as the model name, architecture, benchmark results, hardware requirements, open-source status, and licensing remain unverified.
Microsoft's open source tools were hacked to steal passwords of AI developers★ 78
Hacker News (AI keywords)49 days agoIncident
Microsoft temporarily removed several open source GitHub projects while investigating suspected malicious content. The affected repos were linked to Azure and developer workflows involving AI coding tools such as Claude Code, Gemini CLI, and VS Code. Security researchers said the malware could steal passwords and sensitive credentials when compromised tools were opened, though Microsoft has not disclosed how many users were affected.
Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants?
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user is looking for benchmarks comparing Gemma 4 4-bit QAT models, via Unsloth, against standard 8-bit non-QAT quantized models. They understand QAT is expected to preserve much of the BF16 baseline accuracy, but want hard numbers against traditional 8-bit PTQ. The post highlights scattered feedback but no clear head-to-head evaluation yet.
ggml-webgpu improves prefill speeds for k-quants in llama.cpp PR
r/LocalLLaMA top day49 days agoBenchmark
llama.cpp PR #24225 improves ggml-webgpu matrix multiplication performance for k-quants and refactors matmul paths for Q4/Q5/Q8 and k-quants. In pp512 tests on an M2 Pro, reported speedups range from about 1.33x to 3.78x across Q2_K, Q3_K, Q4_K, Q5_K, and Q6_K. The largest gains appear on Q3_K models, including Qwen and Gemma examples.
Packed twin inference doubles Qwen3.6-27B throughput on one MI50
r/LocalLLaMA top day49 days agoBenchmark
A LocalLLaMA user shared an early packed-twin-inference experiment for local LLM acceleration. The idea resembles speculative decoding, but uses the same quantized model side-by-side instead of a smaller draft model. On a single AMD MI50, the author reports Qwen3.6-27B improving from 19.4 to 38.1 tk/s, with Q8-or-lower quantization as the main target.
JetBrains Mellum 2: a really good and performant model
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user shared informal impressions of JetBrains Mellum 2, focusing on local coding-style tasks and tool calls. On an AMD Radeon RX 7900 XT with llama.cpp Vulkan and 131K context, the model reportedly generated around 111 tokens/s and stayed above 100 tokens/s near full context. The author stresses this is not a scientific benchmark, but a practical workflow-oriented test.
Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72
r/LocalLLaMA top day49 days agoRelease
Omi Health’s founder says he fine-tuned NVIDIA Parakeet TDT 0.6B v2 for clinical speech and released Omi Med STT v1 under CC-BY-4.0. The runtime supports Mac, Windows, and Linux, auto-selecting MLX, NeMo, or GGUF/parakeet.cpp backends. In the author’s held-out medical benchmark, it reports 2.37% medical-WER and 145× realtime on local A10 compute.
A llama.cpp CLI Command Builder
r/LocalLLaMA top day49 days agoNew Tool
A r/LocalLLaMA post introduces a llama.cpp CLI Command Builder with no accounts, email, pop-ups, cookies, or ads. It stores information locally in the browser and includes editable fields for flags and arguments found in the documentation. Users can build CLI or server commands, log run information, and compare which configurations work best for their hardware; only Linux is currently supported.
Pipeline parallelism in llama.cpp may be wasting your VRAM
r/LocalLLaMA top day49 days agoBenchmark
The author compared three llama.cpp Vulkan builds: default 4 sched copies, 1 sched copy, and no pipeline parallelism. In their Qwen GGUF test, input and output throughput were nearly identical across all configurations. However, the default setting used about 1.5GB more VRAM for compute buffers and reduced usable context from roughly 113K tokens to around 88K, though parallel-request benefits were not tested.

← PreviousPage 3Next →

Latest in AI

Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84

Without open LLM competition, closed-source LLM companies will become insatiable

Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) Optimized for Agentic Verification + AgentHarness Evals

Furiosa AI inference chip could be a game changer for local LLMs

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

SCAIL-2: Open-Source End-to-End Character Animation Without Intermediate Pose Representations

Releasing Cohere North Mini Code

Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G

Cohere North Mini Code 1.0

Unsloth Gemma 4 QAT MTP assistant models now available

TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)

Single-slot half-height PCIe V100 with NVLink appears in China

Rick & Morty

Google Introduces Gemma 4 12B: A Unified, Encoder-Free Multimodal Model★ 85

Apple Announced a New On-Device Inference Engine for Apple Silicon

Jetson Orin NX Build for Hermes Agent + Benchmarking

How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces★ 72

TinySearch v0.2.0: Lightweight Open Web-Search Tool for Local LLMs Now Defaults to SearXNG

NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain

ByteDance Open-Sources Bernini, a Unified Framework for AI Video Editing★ 74

A 4B Edge-Deployable Cognitive Model Built in China

Microsoft's open source tools were hacked to steal passwords of AI developers★ 78

Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants?

ggml-webgpu improves prefill speeds for k-quants in llama.cpp PR

Packed twin inference doubles Qwen3.6-27B throughput on one MI50

JetBrains Mellum 2: a really good and performant model

Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72

A llama.cpp CLI Command Builder

Pipeline parallelism in llama.cpp may be wasting your VRAM