Latest in AI

Showing:ResearchersGPTClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

GPT Generates Original AI Research Findings
量子位 QbitAI39 days agoPaper
OpenAI's GPT model has reportedly generated original AI research results, according to Chinese tech outlet QbitAI. The development suggests GPT may now be capable of autonomous scientific contribution beyond summarization or assistance. If confirmed, this marks a notable step toward AI systems that actively advance rather than merely support research.
OpenAI Recruits AI Pioneer Shazeer and Ex-White House Official Ball Ahead of IPO
INSIDE 硬塞 AI39 days agoBusiness
In a single week before its anticipated IPO, OpenAI secured two high-profile recruits: Noam Shazeer, a foundational AI researcher and transformer architecture co-inventor, and a former White House official to lead policy strategy. Shazeer's return to the AI frontlines lends OpenAI significant technical credibility, while the policy hire signals serious preparation for regulatory scrutiny. Together, the moves reflect OpenAI's effort to shore up both its engineering reputation and Washington influence ahead of going public.
GLM-5.2 Passes Vibe Check; Z.ai Forecasts Open Fable by December
Latent Space39 days agoBenchmark
Zhipu AI's GLM-5.2 has passed broad informal community vibe checks, drawing favorable comparisons to GPT-class models and signaling a meaningful quality leap for open-weights AI. Z.ai, the company behind GLM, is additionally forecasting release of an open frontier-tier model — dubbed Open Fable — by December 2026. Together, these developments suggest open models are genuinely competing at the frontier rather than perpetually trailing closed proprietary systems.
OpenAI Recruits Transformer Co-Inventor and Trump-Era AI Policy Official Ahead of IPO★ 70
TechCrunch AI39 days agoBusiness
OpenAI is accelerating its talent acquisition strategy in the run-up to its IPO, securing two marquee names in a single week. Noam Shazeer, a co-inventor of the Transformer architecture that underlies virtually all modern AI, joins from Google DeepMind. Dean Ball, a former AI policy official from the Trump administration, also joins, signaling OpenAI's intent to strengthen its policy and regulatory positioning alongside its technical bench.
Noam Shazeer Joins OpenAI★ 74
Hacker News (AI keywords)40 days agoBusiness
Noam Shazeer, one of the most influential figures in modern AI and a co-author of the landmark 'Attention Is All You Need' paper, has announced he is joining OpenAI. The move represents a major talent acquisition for OpenAI. No further details about his role or terms were provided in the source announcement.
GLM-5.2 Takes the Top Spot Among Text-Only Open-Weights LLMs★ 72
Simon Willison's Weblog40 days agoRelease
Z.ai has released GLM-5.2, a 753B-parameter MIT-licensed open-weights model with a 1-million-token context window. Independent benchmark site Artificial Analysis ranks it first among open-weights models on their Intelligence Index v4.1, ahead of MiniMax-M3, DeepSeek V4 Pro, and Kimi K2.6. It also places second on Code Arena's WebDev leaderboard behind only Claude Fable 5, despite being text-only, and is available on OpenRouter at $1.40/$4.40 per million input/output tokens.
Two-Thirds of Americans Think AI Is Advancing Too Quickly, Pew Research Finds
The Verge AI40 days agoCommentary
A new Pew Research poll reveals a sharp tension in American attitudes toward AI: nearly half of adults now use chatbots at least occasionally, yet nearly two-thirds believe the technology is advancing too quickly. Overall chatbot adoption jumped from 33 percent in 2024 to 49 percent today. ChatGPT has seen especially strong growth, with its usage doubling since 2023.
Leaked Financial Docs Show OpenAI Is Losing Billions of Dollars a Year★ 71
Ars Technica AI41 days agoBusiness
Leaked internal financial documents, described as audited accounting records, reveal that OpenAI is losing billions of dollars annually. While the company's revenues are growing, costs tied to research and development and other operating expenses significantly dwarf those gains. The disclosure offers a rare, credible window into the financial reality behind the world's most prominent AI lab.
Critical Copilot Vulnerability Let Hackers Steal 2FA Codes from Users
Ars Technica AI42 days agoIncident
A critical vulnerability in Microsoft Copilot, named SearchLeak, allowed malicious actors to steal two-factor authentication codes from users — among the most sensitive short-lived credentials in any security workflow. The exploit exposes a recurring weakness in LLM-integrated products: AI assistants with broad data access create novel attack surfaces that conventional security models fail to contain. Ars Technica frames the incident as evidence of the AI industry's persistent, systemic inability to get ahead of LLM-specific security threats.
datasette-agent 0.3a0: Write SQL via Natural Language with User Approval
Simon Willison's Weblog42 days agoRelease
Version 0.3a0 of datasette-agent introduces `execute_write_sql`, a new tool that translates natural language into write SQL statements and prompts the user to confirm before execution. The `datasette agent chat` terminal mode now supports these approval flows, with three new flags — `--root`, `--yes`, and `--unsafe` — to control permission levels and auto-approval. Together these additions enable fully conversational, autonomous modification of SQLite databases via an AI agent.
Ask HN: Has Anyone Replaced Claude/GPT with a Local Model for Daily Coding?
Hacker News (AI keywords)43 days agoCommentary
A Hacker News community thread poses the question of whether developers have successfully migrated their daily coding workflows away from commercial frontier models like Claude and GPT to locally-run alternatives. The post invites practitioners to share real-world experience with self-hosted or locally deployed language models as coding assistants. It surfaces a growing tension between cost, privacy, and latency offered by local models versus the raw capability of cloud-hosted frontier systems.
OpenAI Faces Multi-State AG Investigation Amid IPO Push★ 72
INSIDE 硬塞 AI43 days agoRegulation
A multi-state coalition led by New York's attorney general has issued subpoenas to OpenAI, demanding documents related to ChatGPT's safety, marketing practices, and data handling. The investigation arrives shortly after OpenAI filed for an IPO, introducing significant regulatory headwinds to its public offering timeline. The scrutiny raises fresh questions about whether OpenAI's governance and consumer practices can withstand intensified legal and legislative oversight.
Publishing WASM Wheels to PyPI for Use with Pyodide
Simon Willison's Weblog44 days agoTutorial
Pyodide 314.0 removes a long-standing distribution bottleneck by allowing WebAssembly-compiled Python wheels to be published directly to PyPI, so any package author can now distribute Pyodide-compatible packages without Pyodide team involvement. Previously, the team manually built and hosted over 300 packages. Simon Willison celebrated by publishing luau-wasm — a Lua-based scripting language compiled to WASM — using Codex with GPT-5.5 to automate the packaging workflow.
AINews: Fable and Mythos Access Suspended Over Cybersecurity Risk★ 76
Latent Space45 days agoIncident
Anthropic’s Claude Fable 5 and Mythos 5 were abruptly suspended after a US export-control directive tied to a possible jailbreak and national cybersecurity risk. The roundup frames the event as a new “model sovereignty” warning for teams relying on closed frontier APIs. It also covers Kimi-K2.7-Code, MiniMax M3, DeepSWE replacing SWE-Bench Pro, agent-inference benchmarks, sandboxing, and Gemini-SQL2.
U.S. Government Orders Anthropic to Disable Claude Fable 5 and Mythos 5★ 78
TechCrunch AI45 days agoRegulation
TechCrunch reports that the U.S. government ordered Anthropic to immediately disable Claude Fable 5 and Claude Mythos 5 worldwide, citing national security concerns. Anthropic says the order appears tied to a claimed narrow jailbreak of Fable 5, but argues the cited capability is already common in other public models. The move highlights a potential backlash against Anthropic’s safety-first messaging around especially powerful AI systems.
US Directive Targets Access to Fable 5 and Mythos 5★ 76
Simon Willison's Weblog45 days agoRegulation
Simon Willison comments on Anthropic’s statement that a US government export-control directive requires suspending access to Fable 5 and Mythos 5 for all foreign nationals, including Anthropic employees. Anthropic says the directive cites national security concerns but offers only verbal evidence of a narrow Fable 5 jailbreak. Willison notes that, as of 9:01pm ET, he still had access to Fable through claude.ai and Claude Code.
OpenAI WebRTC Audio Session Adds GPT-Realtime-2 and Document Context
Simon Willison's Weblog45 days agoNew Tool
Simon Willison revisited his OpenAI WebRTC Audio Session tool, originally built in December 2024 to test OpenAI’s realtime audio API. The update lets users choose GPT-Realtime-2, a newer realtime voice model OpenAI described as having GPT-5-class reasoning. It also adds a document-context box, allowing users to paste text before starting a browser-based voice session and discuss that material conversationally.
Fable 5 Falls Short of GPT 5.5 on the “Final Exam” for Agents
量子位 QbitAI46 days agoBenchmark
Based only on the provided title, the article appears to discuss an “agent final exam” evaluation comparing Fable 5 with GPT 5.5. The key claim is that Fable 5, despite expectations implied by the wording, did not outperform GPT 5.5. No benchmark design, scores, task types, methodology, or broader conclusions are available from the supplied content.
UN Report Warns AI Could Consume Drinking Water for 1.3 Billion People by 2030★ 72
INSIDE 硬塞 AI46 days agoEthics
INSIDE summarizes a United Nations University report arguing that AI’s environmental cost cannot be measured by carbon alone. The report projects AI-supporting data centers could use 945 TWh of electricity annually by 2030, while cooling water demand may exceed the annual drinking-water needs of 1.3 billion people. It also says inference dominates lifecycle energy use and that concentrated cloud infrastructure deepens global inequality.
Datasette 1.0a33 Adds JSON API Extras for Queries and Rows
Simon Willison's Weblog46 days agoRelease
Simon Willison announced Datasette 1.0a33, an alpha release that extends the existing ?_extra= JSON API pattern beyond tables to cover queries and rows. The feature is now documented and presented as a significant step toward Datasette 1.0. Willison also used Claude Fable 5 in Claude Code and GPT-5.5 xhigh in Codex Desktop to build a custom extras API explorer demonstrating the new capability.
Silia: A Tiny Transformer Architecture for Sub-10M Parameter Models
r/LocalLLaMA top day47 days agoPaper
A student from India shared their first paper on r/LocalLLaMA, proposing Silia, a Transformer architecture for extremely small models. The idea is to merge attention-style dynamic mixing with SwiGLU-like nonlinear transformation, aiming to save parameters in models under roughly 10M parameters. The author frames the work as an early, small-scale exploration, limited by old hardware and restricted access to larger compute.
DeepSeek v4 Coding Scores Clash With Broader Frontier Benchmarks
r/LocalLLaMA top day47 days agoCommentary
A Reddit post questions why DeepSeek v4 can rank near the top of coding leaderboards while CAISI reportedly places it about eight months behind the US frontier. The author argues that both views may be compatible because coding benchmarks measure a narrow, heavily optimized slice of capability. For local users, the bigger question is how quantized DeepSeek v4 variants perform in real agent workflows, tool calls, cybersecurity, and abstract reasoning.
[AINews] Open Models, Model Labs vs Agent Labs, and the Untrainable★ 72
Latent Space47 days agoCommentary
This AINews issue uses Sarah Guo’s essay as a lens for current AI industry debates: where open models matter, how agent labs differ from model labs, and what cannot be trained away. It also recaps discourse around Anthropic Fable/Mythos, Fable 5’s capabilities, Google’s DiffusionGemma, and maturing agent infrastructure. The central takeaway is that durable value may lie in integration, customer translation, maintenance, and intent rather than model scores alone.
Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74
量子位 QbitAI48 days agoRelease
QbitAI says Anthropic introduced Claude Fable 5 for general users and Claude Mythos 5 for a small set of trusted users. The article highlights software engineering, long-context work, native vision, memory, and scientific research capabilities. It also focuses on a safety-routing design where Fable 5 downgrades high-risk requests to Claude Opus 4.8 instead of simply refusing.
First GPT-5.6 tests arrive, targeting Mythos
量子位 QbitAI48 days agoBenchmark
The title indicates that QbitAI is covering the first hands-on tests of GPT-5.6, framed around a comparison with Mythos. Because the article body is unavailable, the testing setup, metrics, task types, and actual performance gap cannot be verified. The item is best treated as an early benchmark or model-comparison report that needs the original article for proper evaluation.
Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84
Latent Space48 days agoRelease
Anthropic released Claude Fable 5 as its first broadly available Mythos-class model, alongside restricted Mythos 5 access. Benchmarks and ecosystem reports show strong gains in coding, long-horizon agentic tasks, research, and vision. The controversy centers on 30-day retention for Mythos-class traffic and silent interventions that may reduce effectiveness on frontier LLM development tasks, raising trust, reproducibility, and open AI concerns.
GPT-2: Too Dangerous To Release — A 2019 Retrospective
Hacker News (AI keywords)48 days agoCommentary
In 2019, OpenAI staged the release of GPT-2, citing fears it could enable large-scale disinformation and spam generation. The move sparked debate: was it responsible AI safety practice or a savvy PR stunt? Written in late 2022, this blog post revisits the episode now that GPT-2 looks quaint compared to GPT-3/4, asking whether the original fears were justified.
JetBrains Mellum 2: a really good and performant model
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user shared informal impressions of JetBrains Mellum 2, focusing on local coding-style tasks and tool calls. On an AMD Radeon RX 7900 XT with llama.cpp Vulkan and 131K context, the model reportedly generated around 111 tokens/s and stayed above 100 tokens/s near full context. The author stresses this is not a scientific benchmark, but a practical workflow-oriented test.
Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72
r/LocalLLaMA top day49 days agoRelease
Omi Health’s founder says he fine-tuned NVIDIA Parakeet TDT 0.6B v2 for clinical speech and released Omi Med STT v1 under CC-BY-4.0. The runtime supports Mac, Windows, and Linux, auto-selecting MLX, NeMo, or GGUF/parakeet.cpp backends. In the author’s held-out medical benchmark, it reports 2.37% medical-WER and 145× realtime on local A10 compute.
Introducing FrontierCode★ 78
Hacker News (AI keywords)49 days agoBenchmark
Cognition launched FrontierCode, a coding benchmark focused on mergeability rather than only functional correctness. It evaluates correctness, tests, scope discipline, style, and repository-specific quality standards. Built with open-source maintainers and extensive quality control, it shows current frontier models still struggle: Claude Opus 4.8 scores 13.4% on the hardest Diamond subset, ahead of GPT-5.5 and Gemini 3.1 Pro.

Page 1Next →

Latest in AI

GPT Generates Original AI Research Findings

OpenAI Recruits AI Pioneer Shazeer and Ex-White House Official Ball Ahead of IPO

GLM-5.2 Passes Vibe Check; Z.ai Forecasts Open Fable by December

OpenAI Recruits Transformer Co-Inventor and Trump-Era AI Policy Official Ahead of IPO★ 70

Noam Shazeer Joins OpenAI★ 74

GLM-5.2 Takes the Top Spot Among Text-Only Open-Weights LLMs★ 72

Two-Thirds of Americans Think AI Is Advancing Too Quickly, Pew Research Finds

Leaked Financial Docs Show OpenAI Is Losing Billions of Dollars a Year★ 71

Critical Copilot Vulnerability Let Hackers Steal 2FA Codes from Users

datasette-agent 0.3a0: Write SQL via Natural Language with User Approval

Ask HN: Has Anyone Replaced Claude/GPT with a Local Model for Daily Coding?

OpenAI Faces Multi-State AG Investigation Amid IPO Push★ 72

Publishing WASM Wheels to PyPI for Use with Pyodide

AINews: Fable and Mythos Access Suspended Over Cybersecurity Risk★ 76

U.S. Government Orders Anthropic to Disable Claude Fable 5 and Mythos 5★ 78

US Directive Targets Access to Fable 5 and Mythos 5★ 76

OpenAI WebRTC Audio Session Adds GPT-Realtime-2 and Document Context

Fable 5 Falls Short of GPT 5.5 on the “Final Exam” for Agents

UN Report Warns AI Could Consume Drinking Water for 1.3 Billion People by 2030★ 72

Datasette 1.0a33 Adds JSON API Extras for Queries and Rows

Silia: A Tiny Transformer Architecture for Sub-10M Parameter Models

DeepSeek v4 Coding Scores Clash With Broader Frontier Benchmarks

[AINews] Open Models, Model Labs vs Agent Labs, and the Untrainable★ 72

Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74

First GPT-5.6 tests arrive, targeting Mythos

Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84

GPT-2: Too Dangerous To Release — A 2019 Retrospective

JetBrains Mellum 2: a really good and performant model

Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72

Introducing FrontierCode★ 78