Latest in AI

Showing:StudentsOpen-sourceClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

olmo-eval: An Evaluation Workbench for the Model Development Loop
Hugging Face Blog45 days agoNew Tool
The Hugging Face Blog post announces olmo-eval, described as an evaluation workbench for the model development loop. Based on the title alone, the project appears focused on helping teams evaluate models during iterative development rather than only after release. No article body was provided, so specific features, supported benchmarks, integrations, metrics, or usage details cannot be confirmed.
Open Reproduction of DeepSeek-R1
Hacker News (AI keywords)46 days agoRelease
The linked item is a GitHub project titled “Open Reproduction of DeepSeek-R1,” with no article body provided. From the title alone, it appears to be an effort to recreate or document DeepSeek-R1 in an open manner. The main relevance is for researchers and ML engineers interested in reproducible reasoning-model training, evaluation, and open-source alternatives.
Silia: A Tiny Transformer Architecture for Sub-10M Parameter Models
r/LocalLLaMA top day47 days agoPaper
A student from India shared their first paper on r/LocalLLaMA, proposing Silia, a Transformer architecture for extremely small models. The idea is to merge attention-style dynamic mixing with SwiGLU-like nonlinear transformation, aiming to save parameters in models under roughly 10M parameters. The author frames the work as an early, small-scale exploration, limited by old hardware and restricted access to larger compute.
πfs: the data-free filesystem that “stores” data in π
Hacker News (AI keywords)47 days agoNew Tool
πfs is an open-source FUSE-style filesystem built around a deliberately absurd idea: data does not need to be stored if it can be located in pi. It records metadata such as file names and positions in pi, then reconstructs content from those locations. The project is more technical humor and conceptual demonstration than practical storage or AI tooling.
Seeking the Best Open-Source Coding AI for an RTX 5070 PC
r/LocalLLaMA top day47 days agoOpinion
A Reddit user on r/LocalLLaMA is looking for the most powerful open-source AI coding model that can run on their Windows 11 desktop. Their system includes an AMD Ryzen 7 7700 CPU, RTX 5070 GPU, and 32GB of DDR5 RAM. The intended use cases are writing, coding, and debugging, but the post itself does not include benchmark results, candidate models, or community recommendations.
DiffusionGemma: 4x faster text generation★ 74
Google DeepMind Blog47 days agoRelease
Google’s DiffusionGemma is an Apache 2.0 experimental open model using text diffusion instead of standard autoregressive decoding. The 26B MoE model activates 3.8B parameters during inference and is designed for low-latency local workflows. Google claims up to 4x faster generation on dedicated GPUs, while noting that output quality is below standard Gemma 4 and production-quality use cases should still prefer Gemma 4.
OpenLumara Creator Challenges Reddit to Hack Its Public Agent Instance
r/LocalLLaMA top day48 days agoIncident
The creator of OpenLumara posted a public challenge asking r/LocalLLaMA users to try breaking into a Discord-hosted instance of the local-model agent. They claimed common prompt-engineering attacks would not work because modules and sandboxes were heavily locked down. The post later listed several successful findings, including missing path traversal protection, an authorization-check bypass, and another undisclosed exploit pending a fix.
Unsloth Gemma 4 QAT MTP assistant models now available
r/LocalLLaMA top day48 days agoRelease
A r/LocalLLaMA post notes that Unsloth’s Gemma 4 QAT MTP assistant models are now available in GGUF format. The root directories include q8_0 files named mtp-gemma-4-*.gguf, while MTP folders contain q8_0 and larger quantized variants. The listed releases cover 12B, 26B-A4B, 31B, E2B, E2B mobile, E4B, and E4B mobile it-qat-GGUF repositories.
Single-slot half-height PCIe V100 with NVLink appears in China
r/LocalLLaMA top day48 days agoHardware
A r/LocalLLaMA post says a Bilibili creator has shown a single-slot, half-height PCIe V100 with NVLink on a custom PCB. The card is described as 16 cm long, passively cooled by default, capped at 75W, with another version supporting up to 300W. The 16GB model is expected around or below ¥1500, with a 32GB version reportedly planned, but it is not yet available for purchase.
Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants?
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user is looking for benchmarks comparing Gemma 4 4-bit QAT models, via Unsloth, against standard 8-bit non-QAT quantized models. They understand QAT is expected to preserve much of the BF16 baseline accuracy, but want hard numbers against traditional 8-bit PTQ. The post highlights scattered feedback but no clear head-to-head evaluation yet.
A llama.cpp CLI Command Builder
r/LocalLLaMA top day49 days agoNew Tool
A r/LocalLLaMA post introduces a llama.cpp CLI Command Builder with no accounts, email, pop-ups, cookies, or ads. It stores information locally in the browser and includes editable fields for flags and arguments found in the documentation. Users can build CLI or server commands, log run information, and compare which configurations work best for their hardware; only Linux is currently supported.
llama.cpp PR adds MTP support for Gemma-4 E2B and E4B assistants
r/LocalLLaMA top day49 days agoRelease
The Reddit post links to ggml-org/llama.cpp Pull Request #24282, which adds MTP support for Gemma-4 E2B and E4B assistants. The submitter frames it as useful for tiny Gemma models on phones, low-end machines, Raspberry Pi, or similarly constrained devices. The post does not include benchmarks, merge status, or setup instructions, so it should be treated as a development signal rather than a finished release.
Was BitNet a dead end? What happened to ternary LLMs?
r/LocalLLaMA top day49 days agoCommentary
A r/LocalLLaMA user questions whether BitNet and ternary LLMs were a dead end after earlier promise around efficient low-bit models. The post notes that the largest ternary model appears to remain around 2B parameters. It asks why frontier open-weight AI labs are not visibly pursuing the approach, but provides no technical evidence or definitive answer.
An Implementation of NanoQuant: A Flexible Binary Quantization Method
r/LocalLLaMA top day49 days agoNew Tool
A r/LocalLLaMA post presents an unofficial PyTorch implementation of NanoQuant, a 2026 post-training quantization method for dense transformers. The method factorizes weights into scaling vectors and binary matrices, then quantizes and fine-tunes blocks sequentially to reduce hardware requirements. Early Qwen3-0.6B and Qwen3-4B experiments are promising for base models, but instruct quality remains weak and highly dependent on calibration data.
Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax★ 72
r/LocalLLaMA top day49 days agoNew Tool
Luce Spark is an open-source MoE offload system for running 33B-35B A3B models on 16GB-class GPUs. It keeps frequently routed experts on GPU, stores the long tail in system RAM, and swaps cold experts through a bounded async cache. The author reports 13.3 GiB for Qwen3.6 35B-A3B and about 100 tok/s with Spark optimizations, but notes real 16GB GPU testing is still missing.
[3090] Gemma4 QAT + MTP quick TPS numbers
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user shared quick throughput numbers for Gemma4 QAT with MTP speculative decoding on an RTX 3090 24GB setup. They report roughly 1.2-1.8x TPS improvement, with Gemma 4 31B moving from about 40 tok/s to 70-80 tok/s. The author frames this as a rough benchmark, using 11 task categories and noting stochastic variation from temp 1.0.
What was your local daily driver for coding last week?
r/LocalLLaMA top day49 days agoCommentary
This r/LocalLLaMA post is a brief community poll asking users what their local coding daily driver was last week. The post asks commenters to share their favorite model and quant, but the provided text does not include poll options, results, or specific model names. Its value is mainly as a community signal for tracking local LLM coding preferences.
Leanstral: Open-Source Foundation for Trustworthy Vibe-Coding★ 76
Mistral AI News50 days agoRelease
Mistral AI introduced Leanstral, an open-source code agent designed for Lean 4 and formal proof engineering. The model is available through Apache 2.0 weights, Mistral Vibe, and a Labs API endpoint. Mistral positions it as a cost-efficient alternative for verified coding workflows, with FLTEval benchmarks comparing it against Claude family models and large open-source competitors.
CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76
量子位 QbitAI50 days agoPaper
CVPR 2026 named Google DeepMind’s D4RT as Best Paper for fast dynamic 4D scene reconstruction from video. Honorable mentions included Meta’s SAM 3D and NVIDIA’s NitroGen, while TRELLIS.2 won Best Student Paper. The article emphasizes Chinese researcher visibility, ResNet and YOLO receiving the Longuet-Higgins Prize, and a GDUT-led undergraduate-heavy ChordEdit team breaking through among major labs and elite universities.
A Matter Wi-Fi Light Bulb in Rust on the Raspberry Pi Pico 2 W
Hacker News (AI keywords)50 days agoHardware
This GitHub repository collects Rust Embassy examples for Raspberry Pi Pico 2 and Pico 2 W. Its Matter Wi-Fi light example uses rs-matter, BLE commissioning, and Wi-Fi connectivity so the board can appear as a standard smart bulb in Home Assistant, Apple Home, or Google Home. The project is mainly relevant to embedded Rust and smart-home developers, not AI model users.
"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts
r/LocalLLaMA top day50 days agoCommentary
A popular Reddit post highlights a video demonstrating a "Fully Hallucinated Operating System" run entirely inside an LLM. By prompting the model to act as a terminal, it simulates file systems, network requests, and command execution purely through text generation. While impractical for production, this experiment showcases the impressive state-tracking and "world model" capabilities of modern LLMs.
LLM Research Papers: The 2026 List (January to May)
Ahead of AI (Raschka)52 days agoPaper
Sebastian Raschka compiles a curated reference list of LLM papers he bookmarked from January through May 2026. The list is not comprehensive, but organized around topics useful for future articles, lectures, code examples, and research work. Public sections emphasize reasoning, RL, efficient inference, long context, agent systems, tool use, coding agents, diffusion language models, and serving infrastructure.
Thousand Token Wood: shipping a multi-agent economy on a 3B model
Hugging Face Blog52 days agoTutorial
Based on the title, this Hugging Face Blog post presents Thousand Token Wood, a project shipping a multi-agent economy on a 3B model. The likely focus is practical system design under small-model constraints, rather than a new frontier-scale model release. Without the original text, details such as the exact model, architecture, benchmarks, code availability, and results cannot be confirmed.
Tiny hackable CUDA language model implementation
Hacker News (AI keywords)52 days agoNew Tool
This GitHub project implements a compact generative pretrained transformer as an autoregressive byte-level sequence model. Its README describes causal self-attention, RoPE, feed-forward layers, AdamW, cross-entropy training, and BLAS/OpenBLAS-backed matrix operations, with CUDA toolkit listed in setup steps. It is most useful as an educational and experimental codebase, not as a production-grade replacement for large commercial LLMs.
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72
Hacker News (AI keywords)52 days agoRelease
Google released new Gemma 4 checkpoints optimized with Quantization-Aware Training to preserve quality after compression. The release includes Q4_0 checkpoints and a mobile-focused quantization format that can reduce Gemma 4 E2B memory use to about 1GB, or below 1GB for a text-only configuration. The models are available through Hugging Face and supported across llama.cpp, Ollama, LM Studio, LiteRT-LM, Transformers.js, SGLang, vLLM, MLX, and Unsloth.
Ask HN: What is your (AI) dev tech stack / workflow?
Hacker News (AI keywords)52 days agoCommentary
An Ask HN thread asks developers to share their current AI-assisted development setup for upcoming in-person workshops. The author wants guidance for beginners and working developers, with use cases ranging from static sites to FastAPI tools and Linux home automation. Replies cover Claude Code, Cursor, GitHub Copilot, VSCode, spec-driven development, TDD, multi-agent workflows, reviews, and quality control.
Arithmetic Without Numbers: How LLMs Do Math
Hacker News (AI keywords)53 days agoCommentary
The article asks whether LLM arithmetic is memorization, heuristics, real computation, or experimental assistance. It summarizes Rune experiments that decode operations and operands from frozen Llama activations, then route them to Python under a no-parser rule. The strongest supported claim is narrow: activation-derived tool arguments worked in scoped audits, while residual-state JIT replacement, long-number generation, and cross-model transfer remain brittle.
How LLMs Actually Work
Hacker News (AI keywords)54 days agoTutorial
The article explains how modern LLMs convert text into token IDs, embeddings, and position-aware vectors before passing them through stacked transformer blocks. It covers attention, multi-head attention, KV cache, GQA, feed-forward networks, MoE, residual streams, normalization, and decoding. Its goal is educational: helping readers understand the common architecture behind many current model families and read model cards or papers more confidently.
3D-printable humanoid legs let robotics experiments run wild
Ars Technica AI62 days agoHardware
Ars Technica reports that Hugging Face has introduced a roughly $2,500 bipedal humanoid robot project built around 3D-printable legs. The effort targets builders and researchers rather than mainstream consumers, lowering the hardware barrier for hands-on robotics experiments. Its broader significance is in open, reproducible embodied AI research, where models and control systems need physical platforms for testing.
Nathan Lambert 的最新進展：ATOM Report、Post-Training 課程、新書與持續進行的 AI 研究★ 70
Interconnects (Nathan L.)104 days agoRelease
Nathan Lambert, a prominent AI expert, former Alignment Scientist at Hugging Face, and founder of the popular newsletter Interconnects, recently wrote about…

Page 1Next →

Latest in AI

olmo-eval: An Evaluation Workbench for the Model Development Loop

Open Reproduction of DeepSeek-R1

Silia: A Tiny Transformer Architecture for Sub-10M Parameter Models

πfs: the data-free filesystem that “stores” data in π

Seeking the Best Open-Source Coding AI for an RTX 5070 PC

DiffusionGemma: 4x faster text generation★ 74

OpenLumara Creator Challenges Reddit to Hack Its Public Agent Instance

Unsloth Gemma 4 QAT MTP assistant models now available

Single-slot half-height PCIe V100 with NVLink appears in China

Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants?

A llama.cpp CLI Command Builder

llama.cpp PR adds MTP support for Gemma-4 E2B and E4B assistants

Was BitNet a dead end? What happened to ternary LLMs?

An Implementation of NanoQuant: A Flexible Binary Quantization Method

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax★ 72

[3090] Gemma4 QAT + MTP quick TPS numbers

What was your local daily driver for coding last week?

Leanstral: Open-Source Foundation for Trustworthy Vibe-Coding★ 76

CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76

A Matter Wi-Fi Light Bulb in Rust on the Raspberry Pi Pico 2 W

"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts

LLM Research Papers: The 2026 List (January to May)

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Tiny hackable CUDA language model implementation

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72

Ask HN: What is your (AI) dev tech stack / workflow?

Arithmetic Without Numbers: How LLMs Do Math

How LLMs Actually Work

3D-printable humanoid legs let robotics experiments run wild

Nathan Lambert 的最新進展：ATOM Report、Post-Training 課程、新書與持續進行的 AI 研究★ 70