Latest in AI

Showing:ResearchersClaudeClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Expanding Project Glasswing
Anthropic News50 days agoBusiness
Anthropic announced an expansion of Project Glasswing on June 2, 2026. The project will extend to approximately 150 new organizations in more than fifteen countries. Based only on the provided title, this appears to be a program expansion rather than a new model, product feature, or developer tool release.
Introducing Claude Opus 4.8★ 82
Anthropic News50 days agoRelease
Anthropic introduced Claude Opus 4.8 as an upgrade over Opus 4.7, with stronger benchmark performance across coding, agentic skills, reasoning, and knowledge work. The release also adds dynamic workflows in Claude Code, effort controls in claude.ai and Cowork, and new Messages API support for system entries inside the messages array. Pricing for regular usage remains unchanged, while fast mode is now cheaper than previous models.
Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75
r/LocalLLaMA top day50 days agoBenchmark
A Reddit user shared benchmark results showing Google's Gemma 4 31B (FP8) performing on par with Claude Sonnet 4.6 Medium. The custom evaluation harness tested complex tasks including Neo4j Cypher queries, entity extraction, agentic tool calling, Python coding, and multi-vector retrieval synthesis. This highlights how quantized mid-sized open-source models are closing the gap with leading proprietary frontier models.
datasette-agent-edit 0.1a0
Simon Willison's Weblog50 days agoRelease
Simon Willison released datasette-agent-edit 0.1a0 as a base plugin for Datasette Agent. It is intended to support future plugins that edit existing text, including collaborative Markdown, large SQL queries, and SVG files. The design follows Claude’s text editor tool pattern, exposing view, str_replace, and insert primitives so other plugins can reuse a stricter editing workflow.
Office-open-xml-viewer: Office XML document viewer rendering to HTML Canvas
Hacker News (AI keywords)51 days agoNew Tool
office-open-xml-viewer is an open-source browser viewer for Office Open XML documents, rendering DOCX, XLSX, and PPTX files to HTML Canvas. Its parsers are written in Rust and compiled to WebAssembly, while rendering uses the Canvas 2D API. The README also says the full codebase was implemented by Claude through iterative prompting, making it notable as an AI-assisted software development case.
Anthropic, please ship an official Claude Desktop for Linux
Hacker News (AI keywords)51 days agoOpinion
The available source only provides the title, which asks Anthropic to ship an official Claude Desktop app for Linux. It appears to be a community feature request rather than a confirmed product announcement. Without the issue body or official response, there is no basis to infer Anthropic’s plans, timeline, or technical reasoning.
Show HN: Lathe - Use LLMs to learn a new domain, not skip past it
Hacker News (AI keywords)51 days agoNew Tool
Lathe is an open-source tool for generating hands-on technical tutorials with LLM skills. It combines a Go CLI, local reading UI, and commands for asking questions, extending tutorials, and verifying outputs. The project supports Claude Code, Cursor, and Codex workflows, with an emphasis on learning by typing and reasoning through the material yourself.
Her · हेर — a detective for your Claude Code sessions
Hugging Face Blog51 days agoNew Tool
The title presents Her · हेर as a detective for Claude Code sessions. Because the article body is unavailable, its actual features, setup, and implementation details cannot be verified. Conservatively, it appears relevant to developers who want better visibility into what happened during AI-assisted coding sessions.
Mantine DataTable source repo compromised; owner account suspended★ 74
Hacker News (AI keywords)53 days agoIncident
A GitHub security notice says Mantine DataTable and other repositories received unauthorized commits through the github-actions bot. The npm packages were reported safe; the risk targets developers who recently cloned or pulled the source and open it in VS Code, Cursor, Claude Code, Gemini, or run npm test. A later update links the payload to the Miasma / Shai-Hulud worm family and says a stolen credential is the likely path.
Did Claude Increase Bugs in rsync?
Hacker News (AI keywords)53 days agoBenchmark
The article analyzes rsync releases to test whether versions containing Claude commits had unusually high bug rates. It uses severity-weighted bugs per 10 commits, exact permutation testing, and Fisher's exact test. With only two Claude-exposed releases, the evidence is limited, but both releases appear within normal historical variation rather than clear negative outliers.
Anthropic Co-founder Ben Mann Visits Taiwan to Discuss AI Safety and Claude Strategy
INSIDE 硬塞 AI53 days agoBusiness
Anthropic co-founder and Anthropic Labs lead Ben Mann made his first visit to Taiwan, according to INSIDE. The report highlights his role in leading Claude Code and the Model Context Protocol, two key parts of Anthropic’s developer-focused product direction. The discussion centered on Claude strategy, AI safety boundaries, jobs, and Taiwan’s strategic role in the AI landscape.
Fine-tuning an LLM to write docs like it's 1995
Hacker News (AI keywords)53 days agoTutorial
The author builds a corpus from old Microsoft manuals, cleans OCR text, generates instruction-style JSONL examples, and fine-tunes Llama 3.1 8B and Qwen 2.5 7B with QLoRA. Tests cover malloc(), a fictional Win32 API, and a deliberately anachronistic REST API prompt. Qwen fine-tunes transfer the period documentation style best, but the experiment also shows hallucination risks, tuning complexity, and why these models augment rather than replace technical writers.
Show HN: Formally verified polygon intersection, Opus 4.8 one-shot
Hacker News (AI keywords)53 days agoNew Tool
This GitHub project presents a formally verified multipolygon intersection algorithm checked in Lean 4. The author argues trust comes from the Lean checker and a small human-reviewed specification, not from trusting LLM output directly. It also documents how Claude Opus versions improved on Lean proof work, with Opus 4.8 reportedly completing larger proof strategies that earlier attempts could not.
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
Latent Space53 days agoBenchmark
Latent Space talks with Lukas Petersson and Axel Backlund of Andon Labs, the authors behind VendingBench. The episode focuses on evaluating Claude models across a range from Haiku to Mythos. It also discusses how they build frontier evals from scratch, with an emphasis on creating benchmarks that remain useful and meaningful over time.
Reve 2 and Ideogram 4: Layouts in Imagegen
Latent Space54 days agoRelease
Latent Space’s roundup frames image composition as a major barrier now being tackled by layout-aware image models. Reve 2.0 emphasizes precise generation and editing with layouts, while Ideogram 4.0 uses bounding boxes tied to region descriptions. The issue also covers MAI-Thinking-1, Gemma 4 12B, open audio models, agent execution layers, and model-routing cost debates.
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
Hacker News (AI keywords)54 days agoBenchmark
The author built a vulnerable React Native app with a Python backend and a Firebase access-control flaw. GPT 5.5 solved 7 of 10 runs, while Deepseek and Claude variants solved fewer attempts. Many other models failed due to refusals, API-focused tunnel vision, false positives, or inability to use the exposed Firebase path correctly.
How we contain Claude across products★ 74
Hacker News (AI keywords)54 days agoCommentary
Anthropic describes containment as the core security strategy for increasingly capable Claude agents. The post compares ephemeral containers for claude.ai, OS-level sandboxing and approvals for Claude Code, and VM isolation for Claude Cowork. It also details missed risks, including pre-trust project config execution, user-delivered prompt injection, exfiltration through approved domains, and reduced enterprise visibility inside VMs.
How LLMs Actually Work
Hacker News (AI keywords)54 days agoTutorial
The article explains how modern LLMs convert text into token IDs, embeddings, and position-aware vectors before passing them through stacked transformer blocks. It covers attention, multi-head attention, KV cache, GQA, feed-forward networks, MoE, residual streams, normalization, and decoding. Its goal is educational: helping readers understand the common architecture behind many current model families and read model cards or papers more confidently.
No, Artificial Intelligence Is Not Conscious★ 72
Hacker News (AI keywords)55 days agoOpinion
Ted Chiang criticizes the anthropomorphic framing around Anthropic’s Claude and its constitution. He argues that LLMs are sentence-continuation systems producing fictional conversational roles, not entities with subjective experience. The essay warns that presenting chatbots as morally aware risks misleading users and shifting responsibility away from humans and companies.
Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78
Latent Space55 days agoRelease
Microsoft used Build to present itself as both an AI platform and a first-party model lab, announcing seven MAI models across reasoning, code, image, transcription, and voice. The standout was MAI-Thinking-1, described as a 35B active MoE with 256K context and clean data lineage. The recap also ties the launches to GitHub Copilot, Windows agent runtime ambitions, Web IQ grounding APIs, Foundry distribution, and MAIA 200 hardware.
Microsoft's new MAI models★ 72
Simon Willison's Weblog55 days agoRelease
Microsoft announced MAI-Thinking-1, a 35B reasoning model available to select early partners, and MAI-Code-1-Flash, a 5B coding model rolling out to GitHub Copilot individual users in VS Code. Simon Willison highlights their relatively small parameter counts and Microsoft's claim that MAI-Thinking-1 was preferred to Sonnet 4.6 in internal blind evaluations. He also questions what Microsoft's clean and appropriately licensed training data claims mean in practice.
Expanding Project Glasswing★ 76
Hacker News (AI keywords)56 days agoBusiness
Anthropic is expanding Project Glasswing, its program for using Claude Mythos Preview to find vulnerabilities in critical software. The new cohort includes around 150 organizations across more than 15 countries, including infrastructure providers, vendors, nonprofits, and open-source maintainers. Anthropic frames the expansion as preparation for a world where powerful cyber-capable AI models become cheaper and more widely available, shifting focus from finding bugs to validating, disclosing, patching, and deploying fixes.
Launch HN: Expanse (YC P26) - Unlock Wasted GPU Capacity
Hacker News (AI keywords)57 days agoNew Tool
Expanse is a YC P26 launch for improving effective utilization in SLURM and Kubernetes GPU/HPC clusters. It analyzes source code, job scripts, hardware topology, and telemetry before submission to recommend GPU VRAM, CPU, memory, utilization, and walltime. The team says it also detects likely failures, offers line-level optimization hints, and fine-tunes cluster-specific models over time.
Claude Code and Codex Can Have Real-Time Conversation via Git
Hacker News (AI keywords)58 days agoNew Tool
The article introduces Agent Radio, a messaging feature in h5i 0.1.5 for coding agents such as Claude Code and Codex. Instead of relying on an external server, it stores JSONL messages in a Git ref and syncs them through normal push and pull flows. The post includes setup commands, live message watching, PR summary posting, and a short explanation of the i5h protocol.
How we contain Claude across products
Simon Willison's Weblog58 days agoCommentary
Anthropic explains how process sandboxes, VMs, filesystem boundaries, and egress controls limit what Claude agents can access. Claude.ai uses gVisor; local Claude Code uses Seatbelt on macOS and Bubblewrap on Linux; Cowork runs in a full VM. Simon Willison highlights the documentation quality, notes a previously missed file-exfiltration path, and plans to revisit Anthropic's open-source srt tool.
Running Python ASGI Apps in the Browser via Pyodide + a Service Worker
Simon Willison's Weblog58 days agoTutorial
Simon Willison demonstrates an experiment for running Python ASGI apps entirely in the browser using Pyodide and a Service Worker. The approach addresses a Datasette Lite limitation: HTML returned through intercepted navigation did not execute script tags, breaking features and plugins. Claude Opus 4.8, used through Claude Code for web, helped explore the implementation. Basic ASGI and Datasette 1.0a31 demos are available.
I Am Retiring from Tech to Live Offline
Simon Willison's Weblog58 days agoEthics
Simon Willison highlights Chad Whitacre’s decision to leave tech and Open Source, framed not as a forum threat but as concrete action. Whitacre describes wanting to become “AI Amish” or “Internet Amish,” moving toward an offline, analog life closer to 1980 than 1780. A previous post about using Claude Code with Opus 4.5 shows how agentic AI felt intoxicating and unsettling enough to push him away from technological accelerationism.
Rsync 3.4.3 has hundreds of Claude commits
Hacker News (AI keywords)59 days agoCommentary
The source is a Hacker News AI-keyword item linking to a Mastodon post titled “Rsync 3.4.3 has hundreds of Claude commits.” No original body text is available, so the only reliable claim is that many commits in Rsync 3.4.3 are described as Claude-related. The exact meaning, review process, quality impact, and author’s stance cannot be confirmed from the title alone.
CAPTCHAs can still detect AI agents★ 72
Hacker News (AI keywords)60 days agoPaper
Roundtable argues that CAPTCHA image recognition is largely solved, but process-level behavior still separates humans from AI agents. Their CogCAPTCHA30 benchmark combines CAPTCHA with cognitive psychology tasks to test not only outputs, but how answers are produced. Results suggest frontier models like Claude, GPT, and Gemini are not necessarily more humanlike than smaller or cognition-trained models.
Anthropic Releases Claude Opus 4.8 With Integrity Upgrades and Dynamic Workflows
INSIDE 硬塞 AI60 days agoRelease
Anthropic released Claude Opus 4.8 as a rapid iteration focused on stronger integrity and reliability for high-risk tasks. The company also previewed Dynamic Workflows, a feature designed to coordinate multiple agents on large-scale jobs such as code migration. The article mentions Mythos entering a countdown toward unblocking, but does not provide detailed availability or product specifics.

← PreviousPage 4Next →

Latest in AI

Expanding Project Glasswing

Introducing Claude Opus 4.8★ 82

Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75

datasette-agent-edit 0.1a0

Office-open-xml-viewer: Office XML document viewer rendering to HTML Canvas

Anthropic, please ship an official Claude Desktop for Linux

Show HN: Lathe - Use LLMs to learn a new domain, not skip past it

Her · हेर — a detective for your Claude Code sessions

Mantine DataTable source repo compromised; owner account suspended★ 74

Did Claude Increase Bugs in rsync?

Anthropic Co-founder Ben Mann Visits Taiwan to Discuss AI Safety and Claude Strategy

Fine-tuning an LLM to write docs like it's 1995

Show HN: Formally verified polygon intersection, Opus 4.8 one-shot

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

Reve 2 and Ideogram 4: Layouts in Imagegen

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

How we contain Claude across products★ 74

How LLMs Actually Work

No, Artificial Intelligence Is Not Conscious★ 72

Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78

Microsoft's new MAI models★ 72

Expanding Project Glasswing★ 76

Launch HN: Expanse (YC P26) - Unlock Wasted GPU Capacity

Claude Code and Codex Can Have Real-Time Conversation via Git

How we contain Claude across products

Running Python ASGI Apps in the Browser via Pyodide + a Service Worker

I Am Retiring from Tech to Live Offline

Rsync 3.4.3 has hundreds of Claude commits

CAPTCHAs can still detect AI agents★ 72

Anthropic Releases Claude Opus 4.8 With Integrity Upgrades and Dynamic Workflows