Latest in AI

Showing:DevelopersGPTClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

GPT Generates Original AI Research Findings
量子位 QbitAI39 days agoPaper
OpenAI's GPT model has reportedly generated original AI research results, according to Chinese tech outlet QbitAI. The development suggests GPT may now be capable of autonomous scientific contribution beyond summarization or assistance. If confirmed, this marks a notable step toward AI systems that actively advance rather than merely support research.
GLM-5.2 Passes Vibe Check; Z.ai Forecasts Open Fable by December
Latent Space39 days agoBenchmark
Zhipu AI's GLM-5.2 has passed broad informal community vibe checks, drawing favorable comparisons to GPT-class models and signaling a meaningful quality leap for open-weights AI. Z.ai, the company behind GLM, is additionally forecasting release of an open frontier-tier model — dubbed Open Fable — by December 2026. Together, these developments suggest open models are genuinely competing at the frontier rather than perpetually trailing closed proprietary systems.
OpenAI Breaks Down Codex's Three Ways to Use a Computer
INSIDE 硬塞 AI39 days agoCommentary
OpenAI has articulated three distinct operational modes for its Codex coding agent: local execution, cloud-based execution, and cross-environment collaboration. The framework defines where and how the agent takes action on a developer's machine, establishing clear boundaries around AI execution scope and permissions. This clarification helps teams evaluate whether autonomous agents can be safely and controllably integrated into real-world development workflows.
GLM-5.2 Takes the Top Spot Among Text-Only Open-Weights LLMs★ 72
Simon Willison's Weblog40 days agoRelease
Z.ai has released GLM-5.2, a 753B-parameter MIT-licensed open-weights model with a 1-million-token context window. Independent benchmark site Artificial Analysis ranks it first among open-weights models on their Intelligence Index v4.1, ahead of MiniMax-M3, DeepSeek V4 Pro, and Kimi K2.6. It also places second on Code Arena's WebDev leaderboard behind only Claude Fable 5, despite being text-only, and is available on OpenRouter at $1.40/$4.40 per million input/output tokens.
SpaceX to Acquire AI Coding Tool Cursor to Challenge Anthropic and OpenAI★ 73
Ars Technica AI41 days agoBusiness
SpaceX has announced plans to acquire Cursor, the widely used AI-powered code editor, in a bid to challenge Anthropic and OpenAI in the developer-tools market. Neither company could independently compete with the AI giants, but together they believe a combined entity can mount a credible challenge. The deal marks SpaceX's push beyond aerospace into commercial AI software, where control of developer tooling carries significant strategic and market leverage.
Critical Copilot Vulnerability Let Hackers Steal 2FA Codes from Users
Ars Technica AI42 days agoIncident
A critical vulnerability in Microsoft Copilot, named SearchLeak, allowed malicious actors to steal two-factor authentication codes from users — among the most sensitive short-lived credentials in any security workflow. The exploit exposes a recurring weakness in LLM-integrated products: AI assistants with broad data access create novel attack surfaces that conventional security models fail to contain. Ars Technica frames the incident as evidence of the AI industry's persistent, systemic inability to get ahead of LLM-specific security threats.
GitHub Copilot CLI for Beginners: Overview of Common Slash Commands
GitHub Blog42 days agoTutorial
GitHub's official blog publishes a beginner-oriented tutorial on GitHub Copilot CLI, focusing on the slash commands that let users direct the terminal AI agent. The guide targets developers new to AI-assisted command-line workflows, explaining how typed slash directives shape Copilot's behavior in the shell. It serves as an on-ramp for those unfamiliar with Copilot's CLI surface beyond the standard web or IDE experience.
datasette-agent 0.3a0: Write SQL via Natural Language with User Approval
Simon Willison's Weblog42 days agoRelease
Version 0.3a0 of datasette-agent introduces `execute_write_sql`, a new tool that translates natural language into write SQL statements and prompts the user to confirm before execution. The `datasette agent chat` terminal mode now supports these approval flows, with three new flags — `--root`, `--yes`, and `--unsafe` — to control permission levels and auto-approval. Together these additions enable fully conversational, autonomous modification of SQLite databases via an AI agent.
Ask HN: Has Anyone Replaced Claude/GPT with a Local Model for Daily Coding?
Hacker News (AI keywords)42 days agoCommentary
A Hacker News community thread poses the question of whether developers have successfully migrated their daily coding workflows away from commercial frontier models like Claude and GPT to locally-run alternatives. The post invites practitioners to share real-world experience with self-hosted or locally deployed language models as coding assistants. It surfaces a growing tension between cost, privacy, and latency offered by local models versus the raw capability of cloud-hosted frontier systems.
Publishing WASM Wheels to PyPI for Use with Pyodide
Simon Willison's Weblog44 days agoTutorial
Pyodide 314.0 removes a long-standing distribution bottleneck by allowing WebAssembly-compiled Python wheels to be published directly to PyPI, so any package author can now distribute Pyodide-compatible packages without Pyodide team involvement. Previously, the team manually built and hosted over 300 packages. Simon Willison celebrated by publishing luau-wasm — a Lua-based scripting language compiled to WASM — using Codex with GPT-5.5 to automate the packaging workflow.
OpenAI Faces State Attorneys General Investigation★ 72
TechCrunch AI44 days agoRegulation
OpenAI is facing an investigation from state attorneys general, according to TechCrunch. The article says it is not yet clear which states are involved. Reported areas of inquiry include OpenAI's advertising policies and how the company handles health-related data, suggesting regulators are examining both consumer-facing business practices and sensitive information governance.
AINews: Fable and Mythos Access Suspended Over Cybersecurity Risk★ 76
Latent Space45 days agoIncident
Anthropic’s Claude Fable 5 and Mythos 5 were abruptly suspended after a US export-control directive tied to a possible jailbreak and national cybersecurity risk. The roundup frames the event as a new “model sovereignty” warning for teams relying on closed frontier APIs. It also covers Kimi-K2.7-Code, MiniMax M3, DeepSWE replacing SWE-Bench Pro, agent-inference benchmarks, sandboxing, and Gemini-SQL2.
U.S. Government Orders Anthropic to Disable Claude Fable 5 and Mythos 5★ 78
TechCrunch AI45 days agoRegulation
TechCrunch reports that the U.S. government ordered Anthropic to immediately disable Claude Fable 5 and Claude Mythos 5 worldwide, citing national security concerns. Anthropic says the order appears tied to a claimed narrow jailbreak of Fable 5, but argues the cited capability is already common in other public models. The move highlights a potential backlash against Anthropic’s safety-first messaging around especially powerful AI systems.
US Directive Targets Access to Fable 5 and Mythos 5★ 76
Simon Willison's Weblog45 days agoRegulation
Simon Willison comments on Anthropic’s statement that a US government export-control directive requires suspending access to Fable 5 and Mythos 5 for all foreign nationals, including Anthropic employees. Anthropic says the directive cites national security concerns but offers only verbal evidence of a narrow Fable 5 jailbreak. Willison notes that, as of 9:01pm ET, he still had access to Fable through claude.ai and Claude Code.
OpenAI WebRTC Audio Session Adds GPT-Realtime-2 and Document Context
Simon Willison's Weblog45 days agoNew Tool
Simon Willison revisited his OpenAI WebRTC Audio Session tool, originally built in December 2024 to test OpenAI’s realtime audio API. The update lets users choose GPT-Realtime-2, a newer realtime voice model OpenAI described as having GPT-5-class reasoning. It also adds a document-context box, allowing users to paste text before starting a browser-based voice session and discuss that material conversationally.
Fable 5 Falls Short of GPT 5.5 on the “Final Exam” for Agents
量子位 QbitAI46 days agoBenchmark
Based only on the provided title, the article appears to discuss an “agent final exam” evaluation comparing Fable 5 with GPT 5.5. The key claim is that Fable 5, despite expectations implied by the wording, did not outperform GPT 5.5. No benchmark design, scores, task types, methodology, or broader conclusions are available from the supplied content.
UN Report Warns AI Could Consume Drinking Water for 1.3 Billion People by 2030★ 72
INSIDE 硬塞 AI46 days agoEthics
INSIDE summarizes a United Nations University report arguing that AI’s environmental cost cannot be measured by carbon alone. The report projects AI-supporting data centers could use 945 TWh of electricity annually by 2030, while cooling water demand may exceed the annual drinking-water needs of 1.3 billion people. It also says inference dominates lifecycle energy use and that concentrated cloud infrastructure deepens global inequality.
Program Claude Code, Codex, Pi and Other Agent Harnesses with AI SDK
Vercel Changelog46 days agoRelease
Vercel’s changelog entry says AI SDK can now be used to program agent harnesses including Claude Code, Codex, Pi, and other similar tools. Based on the title alone, the update appears aimed at developers who want a common programming interface around coding agents and AI assistant runtimes. No implementation details, APIs, examples, pricing, availability limits, or supported harness list beyond the named products are provided in the source text.
GitHub Availability Report: May 2026
GitHub Blog46 days agoIncident
GitHub’s May 2026 availability report details nine incidents that degraded core services across github.com, GitHub Actions, pull requests, and GitHub Copilot. The report ties broader reliability pressure to rapidly growing traffic from AI-assisted and agentic development workflows. GitHub says it is shifting more traffic to Azure, isolating major services, improving database safeguards, and strengthening failover for affected Copilot model routes.
Datasette 1.0a33 Adds JSON API Extras for Queries and Rows
Simon Willison's Weblog46 days agoRelease
Simon Willison announced Datasette 1.0a33, an alpha release that extends the existing ?_extra= JSON API pattern beyond tables to cover queries and rows. The feature is now documented and presented as a significant step toward Datasette 1.0. Willison also used Claude Fable 5 in Claude Code and GPT-5.5 xhigh in Codex Desktop to build a custom extras API explorer demonstrating the new capability.
Meshy Launches First 3D AI Agent, Calling It a ChatGPT Moment for 3D Creation
量子位 QbitAI47 days agoNew Tool
Meshy has announced what the title describes as the world’s first 3D AI Agent. The report frames the launch as a potential “ChatGPT moment” for 3D creation, suggesting a shift toward more conversational or agentic workflows. Because no article body was provided, details such as capabilities, availability, pricing, benchmarks, and supported formats are not confirmed.
OpenAI mulls slashing prices as it competes with Anthropic for users
Hacker News (AI keywords)47 days agoBusiness
OpenAI is weighing major price reductions as competitive pressure from Anthropic intensifies in the AI market. The move, reported by the Wall Street Journal, signals that the race for users is increasingly being fought on cost as well as capability. Such a pricing shift could have broad implications for developers, enterprises, and the wider AI industry.
Silia: A Tiny Transformer Architecture for Sub-10M Parameter Models
r/LocalLLaMA top day47 days agoPaper
A student from India shared their first paper on r/LocalLLaMA, proposing Silia, a Transformer architecture for extremely small models. The idea is to merge attention-style dynamic mixing with SwiGLU-like nonlinear transformation, aiming to save parameters in models under roughly 10M parameters. The author frames the work as an early, small-scale exploration, limited by old hardware and restricted access to larger compute.
DeepSeek v4 Coding Scores Clash With Broader Frontier Benchmarks
r/LocalLLaMA top day47 days agoCommentary
A Reddit post questions why DeepSeek v4 can rank near the top of coding leaderboards while CAISI reportedly places it about eight months behind the US frontier. The author argues that both views may be compatible because coding benchmarks measure a narrow, heavily optimized slice of capability. For local users, the bigger question is how quantized DeepSeek v4 variants perform in real agent workflows, tool calls, cybersecurity, and abstract reasoning.
[AINews] Open Models, Model Labs vs Agent Labs, and the Untrainable★ 72
Latent Space47 days agoCommentary
This AINews issue uses Sarah Guo’s essay as a lens for current AI industry debates: where open models matter, how agent labs differ from model labs, and what cannot be trained away. It also recaps discourse around Anthropic Fable/Mythos, Fable 5’s capabilities, Google’s DiffusionGemma, and maturing agent infrastructure. The central takeaway is that durable value may lie in integration, customer translation, maintenance, and intent rather than model scores alone.
Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74
量子位 QbitAI48 days agoRelease
QbitAI says Anthropic introduced Claude Fable 5 for general users and Claude Mythos 5 for a small set of trusted users. The article highlights software engineering, long-context work, native vision, memory, and scientific research capabilities. It also focuses on a safety-routing design where Fable 5 downgrades high-risk requests to Claude Opus 4.8 instead of simply refusing.
First GPT-5.6 tests arrive, targeting Mythos
量子位 QbitAI48 days agoBenchmark
The title indicates that QbitAI is covering the first hands-on tests of GPT-5.6, framed around a comparison with Mythos. Because the article body is unavailable, the testing setup, metrics, task types, and actual performance gap cannot be verified. The item is best treated as an early benchmark or model-comparison report that needs the original article for proper evaluation.
Claude Fable 5 First-Day Hands-On Tests Draw Strong Reactions
量子位 QbitAI48 days agoBenchmark
QbitAI reports that Anthropic’s Claude Fable 5 quickly drew widespread hands-on testing after release. Examples include Minecraft UI generation, Photoshop-like creative tools, browser games, websites, Three.js scenes, and coding tasks. The article highlights impressive demos and benchmark claims, but also notes failures in large codebase refactoring and high usage costs.
Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84
Latent Space48 days agoRelease
Anthropic released Claude Fable 5 as its first broadly available Mythos-class model, alongside restricted Mythos 5 access. Benchmarks and ecosystem reports show strong gains in coding, long-horizon agentic tasks, research, and vision. The controversy centers on 30-day retention for Mythos-class traffic and silent interventions that may reduce effectiveness on frontier LLM development tasks, raising trust, reproducibility, and open AI concerns.
GPT-2: Too Dangerous To Release — A 2019 Retrospective
Hacker News (AI keywords)48 days agoCommentary
In 2019, OpenAI staged the release of GPT-2, citing fears it could enable large-scale disinformation and spam generation. The move sparked debate: was it responsible AI safety practice or a savvy PR stunt? Written in late 2022, this blog post revisits the episode now that GPT-2 looks quaint compared to GPT-3/4, asking whether the original fears were justified.

Page 1Next →

Latest in AI

GPT Generates Original AI Research Findings

GLM-5.2 Passes Vibe Check; Z.ai Forecasts Open Fable by December

OpenAI Breaks Down Codex's Three Ways to Use a Computer

GLM-5.2 Takes the Top Spot Among Text-Only Open-Weights LLMs★ 72

SpaceX to Acquire AI Coding Tool Cursor to Challenge Anthropic and OpenAI★ 73

Critical Copilot Vulnerability Let Hackers Steal 2FA Codes from Users

GitHub Copilot CLI for Beginners: Overview of Common Slash Commands

datasette-agent 0.3a0: Write SQL via Natural Language with User Approval

Ask HN: Has Anyone Replaced Claude/GPT with a Local Model for Daily Coding?

Publishing WASM Wheels to PyPI for Use with Pyodide

OpenAI Faces State Attorneys General Investigation★ 72

AINews: Fable and Mythos Access Suspended Over Cybersecurity Risk★ 76

U.S. Government Orders Anthropic to Disable Claude Fable 5 and Mythos 5★ 78

US Directive Targets Access to Fable 5 and Mythos 5★ 76

OpenAI WebRTC Audio Session Adds GPT-Realtime-2 and Document Context

Fable 5 Falls Short of GPT 5.5 on the “Final Exam” for Agents

UN Report Warns AI Could Consume Drinking Water for 1.3 Billion People by 2030★ 72

Program Claude Code, Codex, Pi and Other Agent Harnesses with AI SDK

GitHub Availability Report: May 2026

Datasette 1.0a33 Adds JSON API Extras for Queries and Rows

Meshy Launches First 3D AI Agent, Calling It a ChatGPT Moment for 3D Creation

OpenAI mulls slashing prices as it competes with Anthropic for users

Silia: A Tiny Transformer Architecture for Sub-10M Parameter Models

DeepSeek v4 Coding Scores Clash With Broader Frontier Benchmarks

[AINews] Open Models, Model Labs vs Agent Labs, and the Untrainable★ 72

Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74

First GPT-5.6 tests arrive, targeting Mythos

Claude Fable 5 First-Day Hands-On Tests Draw Strong Reactions

Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84

GPT-2: Too Dangerous To Release — A 2019 Retrospective