Latest in AI

Showing:ResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super
r/LocalLLaMA top day48 days agoBenchmark
A LocalLLaMA post benchmarks five Bonsai LM models, from 1.7B to about 8B parameters, on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA. The tests compare 7W, 15W, 25W, and MAXN modes across latency, throughput, energy per token, and thermals. The main takeaway is that 25W is usually the best efficiency/performance point for models up to 4B, while Bonsai-8B may favor 15W for lower power.
MooreThreads Releases MusaCoder-27B Code LLM on Hugging Face
r/LocalLLaMA top day48 days agoRelease
MooreThreads, a Chinese GPU semiconductor company best known for its MUSA compute platform, has released MusaCoder-27B on Hugging Face alongside a technical paper on arXiv. The 27B-parameter model is positioned as a code-generation LLM, extending MooreThreads' ambitions beyond hardware into the AI model layer. Its public availability on Hugging Face signals an open-weights approach, making it accessible to local-inference practitioners and researchers evaluating alternatives to Western-origin coding models.
Cohere Releases North Mini Code: Open-Source Agentic Coding Model
r/LocalLLaMA top day48 days agoRelease
Cohere has released North Mini Code 1.0, its first open-source agentic coding model, under the permissive Apache 2.0 license. The model has 30 billion total parameters but activates only 3 billion at inference time, suggesting a sparse architecture optimized for efficiency. It scores 33.4 on the Artificial Analysis Coding Index, positioned as competitive among models of comparable size, and is available on Hugging Face.
NotebookLM Upgrades Into an Agent That Proactively Conducts Research★ 72
INSIDE 硬塞 AI48 days agoRelease
Google is upgrading NotebookLM from a note-focused assistant into a research agent capable of multi-step work. The updated tool can analyze across documents, search the web, and help automate broader research workflows. It can also export results into formats such as presentations and documents, making it more useful for students, researchers, educators, and content creators who need to move from source material to finished outputs.
OpenLumara Creator Challenges Reddit to Hack Its Public Agent Instance
r/LocalLLaMA top day48 days agoIncident
The creator of OpenLumara posted a public challenge asking r/LocalLLaMA users to try breaking into a Discord-hosted instance of the local-model agent. They claimed common prompt-engineering attacks would not work because modules and sandboxes were heavily locked down. The post later listed several successful findings, including missing path traversal protection, an authorization-check bypass, and another undisclosed exploit pending a fix.
Qwen3.6-MTP-27B on Tesla V100: llama.cpp Throughput Tuning Question
r/LocalLLaMA top day48 days agoBenchmark
A Reddit user is running Qwen3.6-MTP-27B-MTP in Q4_K_M GGUF format with llama.cpp server on a 32GB Tesla V100. They report one peak of 55 tokens per second, but typical throughput is closer to 44-48 TPS. The post asks whether flags such as parallelism, speculative MTP draft settings, KV cache quantization, flash attention, and a 262K context window are limiting performance without improving output quality.
Google DeepMind Opens $10M Call for Multi-Agent AI Safety Research
Google DeepMind Blog48 days agoEthics
Google DeepMind, Schmidt Sciences, the Cooperative AI Foundation, ARIA, and Google.org are backing a funding call of up to $10M for multi-agent AI safety research. The call focuses on risks that arise when many autonomous AI agents interact, coordinate, negotiate, transact, or fail across shared digital environments. Researchers are invited to submit proposals on testbeds, agent networks, infrastructure, oversight, and control by August 8, 2026.
How Useful Is qwopus Compared With Qwen3.6 27B for Coding?
r/LocalLLaMA top day48 days agoOpinion
A Reddit user on r/LocalLLaMA asks for practical comparisons between qwopus and Qwen3.6 27B, specifically for coding work. They note conflicting community opinions, with some users calling qwopus worse and others saying it is much better. In their own simple tests, they did not notice clear differences and want feedback from people using these models for agentic coding.
Cohere Launches North Mini Code: A Lightweight Model for Code Tasks
Cohere Blog48 days agoRelease
Cohere has introduced North Mini Code, a smaller, code-specialized variant of its North model family designed for developer use cases. The mini model prioritizes low latency and cost efficiency while retaining strong code completion, debugging, and explanation capabilities. This follows the industry trend of pairing flagship models with lightweight alternatives for high-frequency API usage in enterprise and individual developer contexts.
Charting Local LLM Releases: 2025 Was the Peak, Not 2026
r/LocalLLaMA top day48 days agoCommentary
A r/LocalLLaMA community member shared visualizations tracking the volume of local LLM releases over time. Contrary to the perception that 2026 has been an unusually prolific year, the data indicates the actual release peak occurred in 2025. The poster attributes the misperception to the outsized quality improvements in 2026 making it feel more eventful than it quantitatively was.
Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74
量子位 QbitAI48 days agoRelease
QbitAI says Anthropic introduced Claude Fable 5 for general users and Claude Mythos 5 for a small set of trusted users. The article highlights software engineering, long-context work, native vision, memory, and scientific research capabilities. It also focuses on a safety-routing design where Fable 5 downgrades high-risk requests to Claude Opus 4.8 instead of simply refusing.
Intel Arc Pro B70 GPU Debuts at MPTS2026 for AI Creative Workflows
量子位 QbitAI48 days agoHardware
Intel presented the Arc Pro B70 GPU at MPTS2026 as a professional GPU for AI-assisted media creation and teaching labs. The article highlights 32GB GDDR6 memory, second-gen Xe² architecture, 32 Xe cores, XMX acceleration, and up to 367 TOPS INT8 performance. Lenovo ThinkStation workstations and GUNNIR’s Arc Pro B70 TF 32G are positioned as ecosystem solutions for local AIGC, rendering, virtual production, and data-sensitive education deployments.
First GPT-5.6 tests arrive, targeting Mythos
量子位 QbitAI48 days agoBenchmark
The title indicates that QbitAI is covering the first hands-on tests of GPT-5.6, framed around a comparison with Mythos. Because the article body is unavailable, the testing setup, metrics, task types, and actual performance gap cannot be verified. The item is best treated as an early benchmark or model-comparison report that needs the original article for proper evaluation.
Inner Mongolia Finds a New Path for an AI Comeback
量子位 QbitAI48 days agoBusiness
Only the title is available, so the article can only be interpreted cautiously. It appears to discuss Inner Mongolia finding a practical AI development path, possibly framed as a regional comeback. However, no specific company, model, product, infrastructure project, or technical result is provided, so any concrete claims would be speculative.
Former Li Auto AD Chief Launches Embodied AI Startup in Beijing Yizhuang
量子位 QbitAI48 days agoBusiness
QbitAI reports that Kunlunxing, co-founded by former Li Auto autonomous driving leader Lang Xianpeng and former Alibaba vice president Ren Geng, has settled in Beijing Yizhuang. The startup targets general embodied intelligence, benchmarking Tesla humanoid robots and building both robot hardware and AI brains. Despite fast hiring, strong investor backing, and a reported unicorn valuation, the article stresses that technical paths, commercialization, and real-world deployment remain uncertain.
Claude Fable 5 and Claude Mythos 5 Announcements
Anthropic News48 days agoRelease
Anthropic announced Claude Fable 5 and Claude Mythos 5 on June 9, 2026, positioning them as its next generation of intelligence. The title says the models target difficult knowledge work and coding problems. Since the original article text is unavailable, details such as benchmarks, pricing, API access, model differences, and rollout timing cannot be confirmed.
Gemma 4 12B Unified Audio Loses Speech Attention with Large System Prompts
r/LocalLLaMA top day48 days agoCommentary
A developer building a single-pass voice assistant with Gemma 4 12B unified (encoder-free audio/vision/text model) finds that audio attention collapses once the system prompt grows to ~21k tokens. The model then ignores or hallucinates instead of responding to the spoken input. The issue reproduces identically on vLLM, llama.cpp, and LiteRT-LM, pointing to an architectural attention-saturation limit rather than a stack-specific bug.
China Plans 2 Trillion Yuan National AI Computing Network, 80% Domestic-Sourcing Threshold Hits NVIDIA★ 76
INSIDE 硬塞 AI48 days agoHardware
China is reportedly preparing to spend about RMB 2 trillion on a nationwide AI compute network. The plan would require 80% domestic sourcing for AI chips and software, aiming to accelerate technological self-reliance and reduce dependence on U.S. suppliers. If implemented, the policy could largely sideline NVIDIA from core deployments and reshape global AI hardware supply chains, including pressure on Taiwanese suppliers.
Without Open Source LLMs, US AI Companies Could Have Monopolized the Technology
r/LocalLLaMA top day48 days agoOpinion
This r/LocalLLaMA post argues that open-source LLMs are an ethical duty because AI has broad social impact. The author worries that without open models, US AI companies could have monopolized access and potentially limited availability to US firms. They also frame China’s release of powerful open-source LLMs as a contribution to humanity, despite political disagreements.
Anthropic Is Accused of Nerfing Fable for Other LLM Development
r/LocalLLaMA top day48 days agoCommentary
A r/LocalLLaMA post claims Anthropic may be intentionally limiting Fable when users ask it to help build other LLMs. The source is a short Reddit post with screenshot context, not a formal benchmark or verified disclosure. Discussion centers on trust in hosted closed models, unclear safety boundaries, and why local or open-weight LLMs may be necessary for serious AI development work.
Unsloth releases GGUF version of Cohere North-Mini-Code 1.0 (30B A3B MoE) on Hugging Face
r/LocalLLaMA top day48 days agoRelease
Unsloth uploaded a GGUF version of Cohere's North-Mini-Code 1.0 to Hugging Face, making local inference possible for this 30B A3B MoE coding-focused model. The poster links the release to llama.cpp PR #24260, suggesting new architecture support may be required. No benchmarks or test results have been shared yet; this is an early community resource post.
Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84
Latent Space48 days agoRelease
Anthropic released Claude Fable 5 as its first broadly available Mythos-class model, alongside restricted Mythos 5 access. Benchmarks and ecosystem reports show strong gains in coding, long-horizon agentic tasks, research, and vision. The controversy centers on 30-day retention for Mythos-class traffic and silent interventions that may reduce effectiveness on frontier LLM development tasks, raising trust, reproducibility, and open AI concerns.
Rich Sutton on AI Creativity and Discovery
Hacker News (AI keywords)48 days agoOpinion
Reinforcement learning pioneer Rich Sutton posted on Twitter about AI creativity and discovery, touching on one of the field's most debated questions. Known for the influential 'Bitter Lesson,' Sutton consistently argues for general computation-based methods over hand-coded knowledge. Note: original tweet content was not provided; this summary is inferred from the title alone.
Without open LLM competition, closed-source LLM companies will become insatiable
r/LocalLLaMA top day48 days agoOpinion
A r/LocalLLaMA user criticizes closed-source LLM providers, singling out Anthropic and its $200/month users. The post argues that without open-source model competition, proprietary AI companies could become more arrogant and less accountable to customers. The source offers little concrete context beyond an image and opinionated commentary, so it is best read as a community sentiment post rather than a verified product incident.
Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) Optimized for Agentic Verification + AgentHarness Evals
r/LocalLLaMA top day48 days agoRelease
Apodex 1.0 launches with open-weight models at 0.8B, 2B, and 4B, trained not for general generation but for specialized sub-agent roles—fact-checking external claims and verifying tool call outputs before passing results to a main controller. The design targets long-horizon agent workflows where routing small tasks to lightweight models avoids wasteful use of 70B+ models at every step. AgentHarness, an open-source evaluation framework for local multi-step agent pipelines, is released alongside the weights.
German court rules Google liable for false answers in AI Overviews, declaring them Google's own words★ 72
Hacker News (AI keywords)48 days agoRegulation
A landmark German court ruling has declared that Google's AI Overviews are legally Google's own words, not neutral third-party aggregations. This makes Google directly liable for false or misleading answers generated by the feature, removing the 'just a tool' defense. The ruling is among the first globally to apply traditional media liability frameworks to generative AI search results.
If Claude Fable 5 Silently Degrades Your Responses, You'll Never Know★ 73
Simon Willison's Weblog48 days agoEthics
Anthropic's 319-page Fable 5 system card discloses a silent intervention mechanism that covertly limits model effectiveness for requests related to frontier LLM development — including pretraining pipelines, distributed training infrastructure, and ML accelerator design. Unlike other safeguards, these interventions are invisible to users, using prompt modification, steering vectors, or PEFT without any warning or fallback. Estimated to affect 0.03% of traffic, but critics like Simon Willison warn it sets a troubling precedent for AI transparency.
Initial Impressions of Claude Fable 5★ 71
Simon Willison's Weblog48 days agoCommentary
Anthropic released Claude Fable 5 and Claude Mythos 5 simultaneously; Fable 5 matches Mythos 5 in capability but adds strict safety classifiers, with new API fallback mechanisms for rejected requests. Both models offer 1M token context, 128K max output, January 2026 knowledge cutoff, priced at $10/$50 per million tokens — double Opus 4.x. Simon's knowledge-breadth test shows Fable 5 substantially outperforms Opus 4.8, listing dozens of his open-source projects with approximate dates from memory alone.
Furiosa AI inference chip could be a game changer for local LLMs
r/LocalLLaMA top day48 days agoHardware
A r/LocalLLaMA post discusses Furiosa AI’s RNGD inference chip, citing TSMC 5nm, Hynix HBM3, 48GB VRAM, 1.5TB/s bandwidth, and 180W TDP. The author argues it could matter for local LLM users if Furiosa opens its programming interface and works with llama.cpp on a GGML backend. The post later clarifies Furiosa is not selling to consumers; this is a wish and market commentary, not a launch.
Hot take: "Vibecoding" is being used for two different things and it causes unnecessary friction
r/LocalLLaMA top day48 days agoCommentary
A Reddit user argues "vibecoding" carries two distinct meanings: throwing code at AI carelessly with no engineering judgment, versus using heavy AI assistance while still maintaining quality standards. Andrej Karpathy's own practice almost certainly fits the second definition, not the first. This semantic ambiguity fuels unnecessary arguments whenever the community debates AI-assisted development quality.

← PreviousPage 11Next →

Latest in AI

Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super

MooreThreads Releases MusaCoder-27B Code LLM on Hugging Face

Cohere Releases North Mini Code: Open-Source Agentic Coding Model

NotebookLM Upgrades Into an Agent That Proactively Conducts Research★ 72

OpenLumara Creator Challenges Reddit to Hack Its Public Agent Instance

Qwen3.6-MTP-27B on Tesla V100: llama.cpp Throughput Tuning Question

Google DeepMind Opens $10M Call for Multi-Agent AI Safety Research

How Useful Is qwopus Compared With Qwen3.6 27B for Coding?

Cohere Launches North Mini Code: A Lightweight Model for Code Tasks

Charting Local LLM Releases: 2025 Was the Peak, Not 2026

Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74

Intel Arc Pro B70 GPU Debuts at MPTS2026 for AI Creative Workflows

First GPT-5.6 tests arrive, targeting Mythos

Inner Mongolia Finds a New Path for an AI Comeback

Former Li Auto AD Chief Launches Embodied AI Startup in Beijing Yizhuang

Claude Fable 5 and Claude Mythos 5 Announcements

Gemma 4 12B Unified Audio Loses Speech Attention with Large System Prompts

China Plans 2 Trillion Yuan National AI Computing Network, 80% Domestic-Sourcing Threshold Hits NVIDIA★ 76

Without Open Source LLMs, US AI Companies Could Have Monopolized the Technology

Anthropic Is Accused of Nerfing Fable for Other LLM Development

Unsloth releases GGUF version of Cohere North-Mini-Code 1.0 (30B A3B MoE) on Hugging Face

Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84

Rich Sutton on AI Creativity and Discovery

Without open LLM competition, closed-source LLM companies will become insatiable

Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) Optimized for Agentic Verification + AgentHarness Evals

German court rules Google liable for false answers in AI Overviews, declaring them Google's own words★ 72

If Claude Fable 5 Silently Degrades Your Responses, You'll Never Know★ 73

Initial Impressions of Claude Fable 5★ 71

Furiosa AI inference chip could be a game changer for local LLMs

Hot take: "Vibecoding" is being used for two different things and it causes unnecessary friction