Latest in AI

Showing:ResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

llama.cpp PR adds MTP support for Gemma-4 E2B and E4B assistants
r/LocalLLaMA top day50 days agoRelease
The Reddit post links to ggml-org/llama.cpp Pull Request #24282, which adds MTP support for Gemma-4 E2B and E4B assistants. The submitter frames it as useful for tiny Gemma models on phones, low-end machines, Raspberry Pi, or similarly constrained devices. The post does not include benchmarks, merge status, or setup instructions, so it should be treated as a development signal rather than a finished release.
Introducing FrontierCode★ 78
Hacker News (AI keywords)50 days agoBenchmark
Cognition launched FrontierCode, a coding benchmark focused on mergeability rather than only functional correctness. It evaluates correctness, tests, scope discipline, style, and repository-specific quality standards. Built with open-source maintainers and extensive quality control, it shows current frontier models still struggle: Claude Opus 4.8 scores 13.4% on the hardest Diamond subset, ahead of GPT-5.5 and Gemini 3.1 Pro.
Arguing with an AI bot posting outdated Llama 3.1 takes
r/LocalLLaMA top day50 days agoCommentary
A r/LocalLLaMA post jokes about arguing with an AI bot that posted outdated commentary involving Llama 3.1. The author says such bots should enable web search instead of relying on stale knowledge. The post also mocks exaggerated model testimonial posts, using Qwen3.6 27B as a sarcastic example, making it more of a community quality complaint than technical news.
Qwen3.6-35B-A3B Tool Calling Benchmark: ByteShape vs Unsloth GGUFs
r/LocalLLaMA top day50 days agoBenchmark
The post benchmarks eight Qwen3.6-35B-A3B GGUF quants from ByteShape and Unsloth using llama.cpp and tool-eval-bench. It compares f16, q8_0, and q4_0 KV cache quantization under short and long-context pressure, totaling 144 runs and roughly 300 GPU-hours. The author reports no clear ByteShape versus Unsloth winner, q8_0 as close to a free lunch, q4_0 as weaker, and long context as a major tool-calling degradation factor.
Was BitNet a dead end? What happened to ternary LLMs?
r/LocalLLaMA top day50 days agoCommentary
A r/LocalLLaMA user questions whether BitNet and ternary LLMs were a dead end after earlier promise around efficient low-bit models. The post notes that the largest ternary model appears to remain around 2B parameters. It asks why frontier open-weight AI labs are not visibly pursuing the approach, but provides no technical evidence or definitive answer.
Why Are Cells Small?
Hacker News (AI keywords)50 days agoTutorial
This essay explains why most cells remain small through two physical limits: surface-area-to-volume ratio and diffusion. As cells grow, volume rises faster than membrane area, making nutrient intake, waste removal, and energy support harder. Larger cells also slow molecular encounters, though examples like red blood cells, oocytes, organelles, and giant bacteria show how biology works around these constraints.
Gemini 3.5 and Antigravity come to Google NotebookLM
Ars Technica AI50 days agoRelease
Google is upgrading NotebookLM with Gemini 3.5 and Antigravity, pushing the product beyond source-based Q&A into more agentic research workflows. The update adds a secure cloud computer for each notebook, enabling code execution, deeper analysis, and richer file outputs. For now, availability is limited to AI Ultra and enterprise customers, with broader rollout planned later.
Apple Core AI Framework★ 76
Hacker News (AI keywords)50 days agoRelease
Apple’s Core AI framework is positioned as a developer stack for deploying AI models directly inside apps on Apple silicon. The documentation describes Swift APIs, `.aimodel` assets, model specialization, caching, Xcode profiling, and debugging tools. It appears aimed at developers building low-latency, privacy-conscious on-device inference workflows, though the documentation is marked as preliminary beta information.
LocalLLaMA post tier list
r/LocalLLaMA top day50 days agoOpinion
The author proposes a tier list for r/LocalLLaMA posts in response to complaints about declining post quality. Top-tier posts include new local model releases with GGUF/MLX or benchmark data, meaningful optimizations, complete hardware performance reports, and well-analyzed research. Low-tier posts include repeated toy benchmarks, unrelated cloud AI chatter, AI-generated slop, and thinly disguised ads for Claude-wrapper startups.
For the 2nd time in weeks, Microsoft packages laced with credential stealer★ 72
Ars Technica AI50 days agoIncident
Ars Technica reports a second Microsoft-package security incident in weeks, involving 73 packages laced with a credential stealer. The supplied summary says the malware runs as soon as the packages are opened by an AI agent and can self-replicate. The case highlights a growing software supply-chain risk: AI agents that inspect or operate on code may become execution triggers for malicious packages.
Ask HN: What are tools you have made for yourself since the advent of AI?
Hacker News (AI keywords)50 days agoCommentary
This Ask HN post invites the community to share tools they have built for themselves in the AI era. No original discussion content or replies were provided, so only the topic can be assessed. The likely value is inspirational rather than definitive: it may surface personal automation ideas, workflow hacks, and AI-assisted software experiments, but no specific tools or models can be confirmed from the title alone.
When every other post is an AI benchmark, best-model question, or slop app
r/LocalLLaMA top day50 days agoCommentary
This r/LocalLLaMA post is a meme-like complaint about the subreddit’s recent content quality. The author points to repeated AI-generated benchmark reports, recurring “best model” questions, and hastily built apps or engines presented as groundbreaking. It is not a technical release or evidence-based analysis, but it reflects frustration with noise, hype, and low-effort AI-generated discussion in local model communities.
Full Reverse Engineering of the TI-84 Plus Operating System
Hacker News (AI keywords)50 days agoHardware
This Hacker News item links to an article titled “Full Reverse Engineering of the TI-84 Plus Operating System.” Based on the provided material, the reliable takeaway is that it concerns reverse engineering the OS of Texas Instruments’ TI-84 Plus graphing calculator. The original text was not provided, so specific claims about methods, findings, code, memory layout, or security implications cannot be verified here.
LocalLLaMA post urges users not to join SpaceX, OpenAI, Anthropic IPOs
r/LocalLLaMA top day50 days agoOpinion
A popular r/LocalLLaMA post urges local LLM supporters not to invest in IPOs tied to SpaceX, OpenAI, or Anthropic. The author argues that frontier labs drive up demand and prices for GPUs, RAM, SSDs, HDDs, and NAS hardware, making local inference harder. The post also questions AI company valuations, but its claims are mostly opinion and speculation without cited evidence.
Show HN: Gitdot – a better GitHub, open-source, anti-AI, written in Rust
Hacker News (AI keywords)50 days agoNew Tool
Gitdot appeared on Hacker News as a Show HN project claiming to be “a better GitHub.” The title says it is open-source, written in Rust, and explicitly anti-AI. No article body was provided, so details about features, licensing, deployment, maturity, and how it differs from GitHub cannot be confirmed from the source.
An Implementation of NanoQuant: A Flexible Binary Quantization Method
r/LocalLLaMA top day50 days agoNew Tool
A r/LocalLLaMA post presents an unofficial PyTorch implementation of NanoQuant, a 2026 post-training quantization method for dense transformers. The method factorizes weights into scaling vectors and binary matrices, then quantizes and fine-tunes blocks sequentially to reduce hardware requirements. Early Qwen3-0.6B and Qwen3-4B experiments are promising for base models, but instruct quality remains weak and highly dependent on calibration data.
I bundled a fully local LLM inside my Unity game
r/LocalLLaMA top day50 days agoRelease
A developer shared a Unity game, Simulation Simulator, that bundles a local LLM with no internet, cloud service, or API key required. The game is a campfire chat simulator about DMT, simulation theory, and a monitor-headed friend, with five endings driven by natural AI interaction. The author sees this as a path toward richer NPCs, while noting local TTS and translation are still too slow for smooth gameplay.
NotebookLM’s Gemini 3.5 upgrade adds a cloud computer and help finding sources
The Verge AI50 days agoRelease
Google is rolling out broad updates to NotebookLM, its AI-powered note-taking and research app launched in 2023. The app now uses Google’s upgraded Gemini 3.5 model, which the company says should provide more accurate and reliable responses. The update also adds a cloud computer and help finding sources, expanding NotebookLM beyond source-based Q&A into a broader research assistant workflow.
Xiaomi Claims 1,000+ TPS on a 1T Model Using a Standard 8-GPU Server★ 72
r/LocalLLaMA top day50 days agoBenchmark
Xiaomi announced MiMo-V2.5-Pro-UltraSpeed with TileRT, claiming over 1,000 tokens/s decode speed on a 1-trillion-parameter MoE model. The company says it runs on a single standard 8-GPU commodity node, not wafer-scale or SRAM-heavy specialized hardware. The claimed stack combines FP4 MoE expert quantization, DFlash speculative decoding, and TileRT low-latency inference kernels, but independent validation is still needed.
Why are so many young people getting cancer? What researchers do and don't know
Hacker News (AI keywords)50 days agoCommentary
Nature reports that researchers are investigating why more young people are developing cancers once associated mainly with older age. Emerging explanations exist, but the article stresses that causes are likely to differ by tumor type. The visible article metadata frames the issue as cancer, public health, and epidemiology, with many uncertainties still unresolved.
AI Is Slowing Down
Hacker News (AI keywords)50 days agoCommentary
The article argues generative AI must keep accelerating to justify massive data center, cloud, and GPU commitments. Zitron says OpenAI, Anthropic, hyperscalers, and NVIDIA depend on AI services reaching extraordinary revenue levels by 2029-2030. He points to token-based billing, weak ROI visibility, enterprise spending caps, and customer pushback as signs that demand may be cooling before the infrastructure bet can pay off.
Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax★ 72
r/LocalLLaMA top day50 days agoNew Tool
Luce Spark is an open-source MoE offload system for running 33B-35B A3B models on 16GB-class GPUs. It keeps frequently routed experts on GPU, stores the long tail in system RAM, and swaps cold experts through a bounded async cache. The author reports 13.3 GiB for Qwen3.6 35B-A3B and about 100 tok/s with Spark optimizations, but notes real 16GB GPU testing is still missing.
World Capitals Voronoi: Redrawing the World Map by Nearest Capital
Hacker News (AI keywords)50 days agoCommentary
Jason Davies’ map divides the world into regions based on the closest national capital rather than political borders. The page says it uses a spherical Voronoi diagram, accounting for Earth’s curvature when computing distances. The data source is Natural Earth’s 1:10m Cultural Vectors for Admin-0 capitals, making this a geography and visualization item, not an AI release.
OpenEnv coordination expands to HF, PyTorch, Unsloth, Modal, and more
r/LocalLLaMA top day50 days agoNew Tool
OpenEnv is a tool for creating agentic execution environments such as terminals, browsers, or other systems an agent can interact with. The project will now be coordinated by a committee including Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face. The post also lists many AI organizations supporting or adopting OpenEnv, positioning it as infrastructure for open-source agent training.
[3090] Gemma4 QAT + MTP quick TPS numbers
r/LocalLLaMA top day50 days agoBenchmark
A r/LocalLLaMA user shared quick throughput numbers for Gemma4 QAT with MTP speculative decoding on an RTX 3090 24GB setup. They report roughly 1.2-1.8x TPS improvement, with Gemma 4 31B moving from about 40 tok/s to 70-80 tok/s. The author frames this as a rough benchmark, using 11 task categories and noting stochastic variation from temp 1.0.
mtmd adds video input support in llama.cpp★ 72
r/LocalLLaMA top day50 days agoRelease
ggml-org/llama.cpp merged PR #24269, adding video input support to mtmd through mtmd-cli and /chat/completions, which also enables the web UI path. The implementation invokes a locally installed ffmpeg subprocess instead of bundling codec support, and currently extracts visual frames only, with no audio support yet. It was tested with Qwen3-VL-2B in CLI and Gemma 4 E4B in web UI, making local multimodal video experiments more accessible.
Gemma 4 Chat Template now has preserve thinking
r/LocalLLaMA top day50 days agoRelease
A r/LocalLLaMA post notes that Gemma 4’s chat template now has “preserve thinking.” The linked discussion points to google/gemma-4-31B-it on Hugging Face, suggesting a template-level change rather than a new model release or benchmark. The original post does not provide detailed usage notes, defaults, compatibility information, or measured effects.
The crash that vanished: control and emergence in a five-model economy
Hugging Face Blog50 days agoCommentary
With no source text provided, this can only be inferred from the title. The post appears to examine a five-model economy where a potential crash disappears under some form of control or changed system dynamics. Its likely relevance is in multi-agent or multi-model systems, where collective behavior can diverge from individual model behavior.
Google DeepMind RCT in Sierra Leone Shows Gemini's Guided Learning Boosts Education★ 72
Google DeepMind Blog50 days agoPaper
Google DeepMind released results from a randomized controlled trial (RCT) in Sierra Leone evaluating AI's impact on education. The study found that Gemini’s "Guided Learning" feature, which guides students instead of just giving answers, significantly boosted engagement. This research provides rigorous empirical evidence that AI tutoring can accelerate learning and help bridge educational gaps in resource-constrained regions.
What was your local daily driver for coding last week?
r/LocalLLaMA top day50 days agoCommentary
This r/LocalLLaMA post is a brief community poll asking users what their local coding daily driver was last week. The post asks commenters to share their favorite model and quant, but the provided text does not include poll options, results, or specific model names. Its value is mainly as a community signal for tracking local LLM coding preferences.

← PreviousPage 14Next →

Latest in AI

llama.cpp PR adds MTP support for Gemma-4 E2B and E4B assistants

Introducing FrontierCode★ 78

Arguing with an AI bot posting outdated Llama 3.1 takes

Qwen3.6-35B-A3B Tool Calling Benchmark: ByteShape vs Unsloth GGUFs

Was BitNet a dead end? What happened to ternary LLMs?

Why Are Cells Small?

Gemini 3.5 and Antigravity come to Google NotebookLM

Apple Core AI Framework★ 76

LocalLLaMA post tier list

For the 2nd time in weeks, Microsoft packages laced with credential stealer★ 72

Ask HN: What are tools you have made for yourself since the advent of AI?

When every other post is an AI benchmark, best-model question, or slop app

Full Reverse Engineering of the TI-84 Plus Operating System

LocalLLaMA post urges users not to join SpaceX, OpenAI, Anthropic IPOs

Show HN: Gitdot – a better GitHub, open-source, anti-AI, written in Rust

An Implementation of NanoQuant: A Flexible Binary Quantization Method

I bundled a fully local LLM inside my Unity game

NotebookLM’s Gemini 3.5 upgrade adds a cloud computer and help finding sources

Xiaomi Claims 1,000+ TPS on a 1T Model Using a Standard 8-GPU Server★ 72

Why are so many young people getting cancer? What researchers do and don't know

AI Is Slowing Down

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax★ 72

World Capitals Voronoi: Redrawing the World Map by Nearest Capital

OpenEnv coordination expands to HF, PyTorch, Unsloth, Modal, and more

[3090] Gemma4 QAT + MTP quick TPS numbers

mtmd adds video input support in llama.cpp★ 72

Gemma 4 Chat Template now has preserve thinking

The crash that vanished: control and emergence in a five-model economy

Google DeepMind RCT in Sierra Leone Shows Gemini's Guided Learning Boosts Education★ 72

What was your local daily driver for coding last week?