Latest in AI

Showing:DevelopersOpen-sourceClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Thousand Token Wood: shipping a multi-agent economy on a 3B model
Hugging Face Blog52 days agoTutorial
Based on the title, this Hugging Face Blog post presents Thousand Token Wood, a project shipping a multi-agent economy on a 3B model. The likely focus is practical system design under small-model constraints, rather than a new frontier-scale model release. Without the original text, details such as the exact model, architecture, benchmarks, code availability, and results cannot be confirmed.
Hermes Agent – Open-source AI agent with persistent memory
Hacker News (AI keywords)52 days agoNew Tool
Hermes Agent is an open-source autonomous agent by Nous Research, designed to run on your own server or machine with persistent local memory. It offers messaging gateways, scheduled automations, browser control, parallel sub-agents, reusable skills, and multiple LLM provider options. The project also targets MLOps and research workflows, including tool-calling trajectory generation, RL experiments, and exportable fine-tuning data.
Tiny hackable CUDA language model implementation
Hacker News (AI keywords)53 days agoNew Tool
This GitHub project implements a compact generative pretrained transformer as an autoregressive byte-level sequence model. Its README describes causal self-attention, RoPE, feed-forward layers, AdamW, cross-entropy training, and BLAS/OpenBLAS-backed matrix operations, with CUDA toolkit listed in setup steps. It is most useful as an educational and experimental codebase, not as a production-grade replacement for large commercial LLMs.
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72
Hacker News (AI keywords)53 days agoRelease
Google released new Gemma 4 checkpoints optimized with Quantization-Aware Training to preserve quality after compression. The release includes Q4_0 checkpoints and a mobile-focused quantization format that can reduce Gemma 4 E2B memory use to about 1GB, or below 1GB for a text-only configuration. The models are available through Hugging Face and supported across llama.cpp, Ollama, LM Studio, LiteRT-LM, Transformers.js, SGLang, vLLM, MLX, and Unsloth.
Ask HN: What is your (AI) dev tech stack / workflow?
Hacker News (AI keywords)53 days agoCommentary
An Ask HN thread asks developers to share their current AI-assisted development setup for upcoming in-person workshops. The author wants guidance for beginners and working developers, with use cases ranging from static sites to FastAPI tools and Linux home automation. Replies cover Claude Code, Cursor, GitHub Copilot, VSCode, spec-driven development, TDD, multi-agent workflows, reviews, and quality control.
Arithmetic Without Numbers: How LLMs Do Math
Hacker News (AI keywords)53 days agoCommentary
The article asks whether LLM arithmetic is memorization, heuristics, real computation, or experimental assistance. It summarizes Rune experiments that decode operations and operands from frozen Llama activations, then route them to Python under a no-parser rule. The strongest supported claim is narrow: activation-derived tool arguments worked in scoped audits, while residual-state JIT replacement, long-number generation, and cross-model transfer remain brittle.
Fine-tuning an LLM to write docs like it's 1995
Hacker News (AI keywords)53 days agoTutorial
The author builds a corpus from old Microsoft manuals, cleans OCR text, generates instruction-style JSONL examples, and fine-tunes Llama 3.1 8B and Qwen 2.5 7B with QLoRA. Tests cover malloc(), a fictional Win32 API, and a deliberately anachronistic REST API prompt. Qwen fine-tunes transfer the period documentation style best, but the experiment also shows hallucination risks, tuning complexity, and why these models augment rather than replace technical writers.
Magenta RealTime 2: An Open, Locally Runnable Real-Time Music Model★ 74
Hacker News (AI keywords)53 days agoRelease
Magenta RealTime 2 is an open-weights live music model designed for interactive performance rather than offline prompt-to-song generation. It supports real-time control through MIDI, audio, and text, and can run as standalone apps, DAW plugins, or embedded music software. Google Magenta also released a Python library, C++ MLX inference engine, models, and example applications for musicians and developers.
Open Code Review – An AI-powered code review CLI tool
Hacker News (AI keywords)53 days agoNew Tool
Open Code Review appears to be a GitHub-hosted CLI tool focused on AI-assisted code review. Based only on the title, it likely targets developers who want review feedback from the command line or automation workflows. No article body was provided, so model support, language coverage, CI integration, licensing, and review quality cannot be confirmed.
Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
Hugging Face Blog54 days agoTutorial
The post appears to focus on generating synthetic Q&A data from task seeds for Nemotron pretraining. Rather than a model launch, it likely emphasizes data generation and pretraining corpus design. Because the original article text is unavailable here, concrete claims about dataset scale, benchmarks, or implementation details should not be inferred.
Reve 2 and Ideogram 4: Layouts in Imagegen
Latent Space54 days agoRelease
Latent Space’s roundup frames image composition as a major barrier now being tackled by layout-aware image models. Reve 2.0 emphasizes precise generation and editing with layouts, while Ideogram 4.0 uses bounding boxes tied to region descriptions. The issue also covers MAI-Thinking-1, Gemma 4 12B, open audio models, agent execution layers, and model-routing cost debates.
Designing the hf CLI as an agent-optimized way to work with the Hub
Hugging Face Blog54 days agoCommentary
Based only on the title, this Hugging Face post appears to explain how the hf CLI is being designed for AI agents working with the Hub. It likely focuses on command-line ergonomics, automation, and predictable interactions with Hub resources. Without the full text, specific features, supported agents, or implementation details should not be inferred.
How LLMs Actually Work
Hacker News (AI keywords)54 days agoTutorial
The article explains how modern LLMs convert text into token IDs, embeddings, and position-aware vectors before passing them through stacked transformer blocks. It covers attention, multi-head attention, KV cache, GQA, feed-forward networks, MoE, residual streams, normalization, and decoding. Its goal is educational: helping readers understand the common architecture behind many current model families and read model cards or papers more confidently.
Google's Gemma 4 12B is designed to run on 16GB RAM laptops
Ars Technica AI55 days agoRelease
Google introduced Gemma 4 12B, an open model aimed at running locally on laptops with 16GB of RAM. The model uses a new encoding scheme and token prediction to improve efficiency relative to its size. Its practical importance depends on real-world benchmarks, but it could lower the barrier for private, offline, and local multimodal AI workflows.
Direct Preference Optimization Beyond Chatbots
Hugging Face Blog55 days agoTutorial
Based only on the title, this Hugging Face Blog post appears to discuss Direct Preference Optimization outside conventional chatbot use cases. It may frame DPO as a broader preference-alignment method for model outputs, workflows, or non-conversational AI systems. Without the full article, specific claims about experiments, datasets, models, or implementation details cannot be verified.
Show HN: Paseo - Beautiful open-source coding agent interface
Hacker News (AI keywords)55 days agoNew Tool
Paseo provides one interface for tools such as Claude Code, Codex, Copilot, OpenCode, and Pi. It runs agents through a local daemon on the user's own machine and supports desktop, mobile, web, and CLI clients. Its appeal is multi-agent orchestration and cross-device control, though real adoption depends on workflow fit, security, and reliability.
NVIDIA Cosmos 3: An Open Omni-model for Physical AI Reasoning and Action
Hugging Face Blog57 days agoRelease
Hugging Face Blog announces NVIDIA Cosmos 3, described as the first open omni-model for Physical AI reasoning and action. The title indicates a focus on AI systems that interact with physical-world scenarios rather than only text generation. Because the article body was not provided, its architecture, supported modalities, license, downloadable assets, benchmarks, and deployment requirements cannot be verified from the available material.
I Am Retiring from Tech to Live Offline
Simon Willison's Weblog59 days agoEthics
Simon Willison highlights Chad Whitacre’s decision to leave tech and Open Source, framed not as a forum threat but as concrete action. Whitacre describes wanting to become “AI Amish” or “Internet Amish,” moving toward an offline, analog life closer to 1980 than 1780. A previous post about using Claude Code with Opus 4.5 shows how agentic AI felt intoxicating and unsettling enough to push him away from technological accelerationism.
As browser wars heat up, top Chrome and Safari alternatives in 2026
TechCrunch AI59 days agoCommentary
TechCrunch frames 2026’s browser competition around alternatives to Chrome and Safari. The roundup covers AI-centric browsers like Perplexity Comet, Dia, Opera Neon, OpenAI Atlas, and Aside, alongside privacy-focused options such as Brave, DuckDuckGo, Ladybird, and Vivaldi. It also highlights niche products including Opera Air, SigmaOS, and Zen Browser, showing how browsers are becoming AI assistants, productivity hubs, privacy layers, and wellness-oriented tools.
Show HN: Tiny-vLLM, a C++ and CUDA LLM Inference Engine
Hacker News (AI keywords)60 days agoNew Tool
Tiny-vLLM is a Show HN project described as a high-performance LLM inference engine implemented in C++ and CUDA. From the provided title alone, the project appears aimed at developers or ML engineers interested in GPU-accelerated local or server-side inference. No further claims about supported models, benchmarks, APIs, licensing, deployment targets, or production readiness are stated in the source.
CAPTCHAs can still detect AI agents★ 72
Hacker News (AI keywords)60 days agoPaper
Roundtable argues that CAPTCHA image recognition is largely solved, but process-level behavior still separates humans from AI agents. Their CogCAPTCHA30 benchmark combines CAPTCHA with cognitive psychology tasks to test not only outputs, but how answers are produced. Results suggest frontier models like Claude, GPT, and Gemini are not necessarily more humanlike than smaller or cognition-trained models.
Has the hunt for AI compute uncovered the next Cerebras?
TechCrunch AI61 days agoHardware
TechCrunch reports that General Compute has raised a $15 million seed round at a $60 million post-money valuation to build an AI inference neocloud. The company is ordering $300 million of SambaNova SN50 chips, betting they can outperform GPUs and rival specialized chips for inference. The story frames inference speed, deployment flexibility, and lower power needs as key battlegrounds in AI infrastructure.
ESMFold2: The Bitter Lesson Is Coming for Proteins★ 74
Latent Space62 days agoCommentary
Latent Space interviews Biohub’s Alex Rives about ESMFold2 and the broader ESM protein modeling stack. The discussion centers on datasets versus inductive bias, and whether protein biology is entering its own Bitter Lesson era. The key implication is that large-scale evolutionary sequence data and open models may become foundations for structure prediction, interaction modeling, and programmable biology.
New AI Infra Decacorns: Fireworks, Baseten, and OpenRouter★ 78
Latent Space62 days agoBusiness
AI infrastructure startups Fireworks and Baseten have reportedly reached massive valuations, reflecting intense investor interest in developer-focused inference and deployment platforms. OpenRouter, the popular LLM API aggregator, is also on a rapid growth trajectory. This funding wave highlights a major capital shift toward cost-effective, developer-friendly API and hosting solutions.
Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL
Hugging Face Blog62 days agoTutorial
Based on the title, this Hugging Face Blog post focuses on Delta Weight Sync in TRL. It likely discusses moving or synchronizing weight differences at very large model scale using a Hub bucket-related workflow. Without the full article, implementation details, benchmarks, APIs, and stability claims cannot be confirmed.
Reachy Mini goes fully local
Hugging Face Blog62 days agoHardware
Hugging Face published a tutorial for running Reachy Mini conversations without cloud audio processing or API keys. The setup uses its speech-to-speech library as a cascaded VAD, STT, LLM, and TTS pipeline exposed through a Realtime API-compatible WebSocket. Recommended defaults include llama.cpp with Gemma 4, Silero VAD, Parakeet-TDT, and Qwen3-TTS, while allowing swaps to vLLM, MLX, Transformers, or hosted Responses API providers.
Millions of AI agents imperiled by critical vulnerability in open source package★ 78
Ars Technica AI62 days agoIncident
Ars Technica reports that Starlette, a Python package with about 325 million weekly downloads, has a critical vulnerability called BadHost. The flaw can let crafted Host headers confuse request.url.path, potentially bypassing middleware-based path authorization. AI infrastructure using FastAPI or Starlette, including vLLM, LiteLLM, MCP servers, LLM proxies, and agent frameworks, should upgrade Starlette and audit custom middleware.
3D-printable humanoid legs let robotics experiments run wild
Ars Technica AI63 days agoHardware
Ars Technica reports that Hugging Face has introduced a roughly $2,500 bipedal humanoid robot project built around 3D-printable legs. The effort targets builders and researchers rather than mainstream consumers, lowering the hardware barrier for hands-on robotics experiments. Its broader significance is in open, reproducible embodied AI research, where models and control systems need physical platforms for testing.
Some ideas for what comes next, May 2026
Interconnects (Nathan L.)63 days agoCommentary
Nathan Lambert argues that 2026 AI progress is becoming higher-stakes, with model capabilities, work patterns, economics, and real-world risks all escalating. He says open models still lack a true Claude Code and Opus 4.5-style agent moment, and Gemini has no clear competitor to Claude Code or Codex yet. The essay also tracks Mythos, American open-model momentum, frontier-lab competition, and mounting intervention from governments and other power structures.
專業化勝過規模：大多數 AI 採購決策忽略的關鍵戰略變數★ 75
Hugging Face Blog67 days agoOpinion
In the current wave of enterprise AI adoption, most decision-makers fall into the "scale myth" when making AI procurement decisions — the belief that the…

← PreviousPage 6Next →

Latest in AI

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Hermes Agent – Open-source AI agent with persistent memory

Tiny hackable CUDA language model implementation

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72

Ask HN: What is your (AI) dev tech stack / workflow?

Arithmetic Without Numbers: How LLMs Do Math

Fine-tuning an LLM to write docs like it's 1995

Magenta RealTime 2: An Open, Locally Runnable Real-Time Music Model★ 74

Open Code Review – An AI-powered code review CLI tool

Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining

Reve 2 and Ideogram 4: Layouts in Imagegen

Designing the hf CLI as an agent-optimized way to work with the Hub

How LLMs Actually Work

Google's Gemma 4 12B is designed to run on 16GB RAM laptops

Direct Preference Optimization Beyond Chatbots

Show HN: Paseo - Beautiful open-source coding agent interface

NVIDIA Cosmos 3: An Open Omni-model for Physical AI Reasoning and Action

I Am Retiring from Tech to Live Offline

As browser wars heat up, top Chrome and Safari alternatives in 2026

Show HN: Tiny-vLLM, a C++ and CUDA LLM Inference Engine

CAPTCHAs can still detect AI agents★ 72

Has the hunt for AI compute uncovered the next Cerebras?

ESMFold2: The Bitter Lesson Is Coming for Proteins★ 74

New AI Infra Decacorns: Fireworks, Baseten, and OpenRouter★ 78

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Reachy Mini goes fully local

Millions of AI agents imperiled by critical vulnerability in open source package★ 78

3D-printable humanoid legs let robotics experiments run wild

Some ideas for what comes next, May 2026

專業化勝過規模：大多數 AI 採購決策忽略的關鍵戰略變數★ 75