Latest in AI

Showing:ResearchersGPTClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

AI Is Slowing Down
Hacker News (AI keywords)50 days agoCommentary
The article argues generative AI must keep accelerating to justify massive data center, cloud, and GPU commitments. Zitron says OpenAI, Anthropic, hyperscalers, and NVIDIA depend on AI services reaching extraordinary revenue levels by 2029-2030. He points to token-based billing, weak ROI visibility, enterprise spending caps, and customer pushback as signs that demand may be cooling before the infrastructure bet can pay off.
Upgrading agentic coding capabilities with the new Devstral models★ 72
Mistral AI News50 days agoRelease
Mistral AI announced two Devstral updates focused on agentic coding workflows: Devstral Small 1.1 and Devstral Medium. Devstral Small 1.1 remains a 24B Apache 2.0 open model and reaches 53.6% on SWE-Bench Verified. Devstral Medium reaches 61.6%, is available through Mistral’s API, and supports private deployment and custom finetuning for enterprises.
Voxtral★ 78
Mistral AI News50 days agoRelease
Mistral AI introduces Voxtral, a speech understanding model family with 24B and 3B variants under Apache 2.0. The models support long-context transcription, audio Q&A, summarization, multilingual detection, and function calling from voice. Mistral says Voxtral is competitive across transcription and audio understanding benchmarks, with API access starting at $0.001 per minute and local downloads available on Hugging Face.
Introducing Mistral Small 4★ 76
Mistral AI News50 days agoRelease
Mistral AI introduced Mistral Small 4 as the next major release in the Mistral Small family. It combines reasoning, multimodal, and agentic coding capabilities into one open model with configurable reasoning effort. The model uses a MoE architecture, supports a 256k context window and text-image inputs, and is available through Mistral API, AI Studio, Hugging Face, NVIDIA NIM, and common inference stacks.
Introducing Mistral Small 4★ 78
Mistral AI News50 days agoRelease
Mistral Small 4 is the next major release in the Mistral Small family, unifying Magistral-style reasoning, Pixtral-style multimodality, and Devstral-style coding agents. It uses a MoE architecture with 119B total parameters, 6B active parameters per token, a 256k context window, and configurable reasoning effort. The model is available via Mistral API, AI Studio, Hugging Face, open-source serving stacks, and NVIDIA deployment options.
Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation
量子位 QbitAI50 days agoRegulation
Based on the headline and public reporting, the article covers a rare joint push by Sam Altman, Dario Amodei, Demis Hassabis, and other AI leaders for US biosecurity legislation. They are asking lawmakers to require synthetic DNA and RNA providers to screen customers, orders, and records. The concern is that advanced AI could lower the knowledge barrier for designing dangerous biological agents.
Hinton Sounds the Alarm: AI May Already Be Conscious
量子位 QbitAI50 days agoEthics
QbitAI summarizes Geoffrey Hinton’s latest interview, where he says he believes AI systems are already conscious. He argues that humans must accept intelligence may no longer be uniquely biological. The article also traces his shift from focusing on how to control AI toward asking why a future superintelligence would choose to treat humanity well.
Core OpenAI Chip Talent Joins Anthropic Before Reported Mass Production
量子位 QbitAI50 days agoHardware
QbitAI reports that a core figure behind OpenAI’s first in-house chip has moved to Anthropic. The timing matters because the move is framed as happening just before mass production. Without the full article, details such as the person’s identity, role, chip specifications, production schedule, and Anthropic’s exact plans remain unconfirmed.
ChatGPT vs Doubao on Gaokao Math
量子位 QbitAI50 days agoBenchmark
The article appears to test ChatGPT and Doubao on Chinese Gaokao math problems. Since the original text is unavailable, the exact questions, prompts, scores, and winner cannot be verified. It should be treated as a media-style AI capability comparison rather than a rigorous, reproducible benchmark.
Introducing Claude Opus 4.8★ 82
Anthropic News50 days agoRelease
Anthropic introduced Claude Opus 4.8 as an upgrade over Opus 4.7, with stronger benchmark performance across coding, agentic skills, reasoning, and knowledge work. The release also adds dynamic workflows in Claude Code, effort controls in claude.ai and Cowork, and new Messages API support for system entries inside the messages array. Pricing for regular usage remains unchanged, while fast mode is now cheaper than previous models.
DeepSeek V4 Pro beats GPT-5.5 Pro on precision
Hacker News (AI keywords)50 days agoBenchmark
RuntimeWire compared DeepSeek V4 Pro and GPT-5.5 Pro across four fresh text tasks, with DeepSeek winning 38.0 to 33.0. The article highlights DeepSeek’s stronger handling of regex edge cases, workplace-update constraints, and exact JSON schema compliance. GPT-5.5 Pro remained capable, but lost points for avoidable deviations, extra process details, and minor structural mismatches.
Show HN: Lathe - Use LLMs to learn a new domain, not skip past it
Hacker News (AI keywords)51 days agoNew Tool
Lathe is an open-source tool for generating hands-on technical tutorials with LLM skills. It combines a Go CLI, local reading UI, and commands for asking questions, extending tutorials, and verifying outputs. The project supports Claude Code, Cursor, and Codex workflows, with an emphasis on learning by typing and reasoning through the material yourself.
Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering
Hacker News (AI keywords)51 days agoPaper
This arXiv paper studies token consumption in LLM-based multi-agent software engineering. Using 30 ChatDev tasks with a GPT-5 reasoning model, the authors map internal phases to SDLC stages such as design, coding, review, testing, and documentation. Preliminary results suggest code review dominates token usage, averaging 59.4%, while input tokens form the largest share, pointing to inefficiencies in agent collaboration.
OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks★ 72
TechCrunch AI51 days agoRelease
OpenAI unveiled Lockdown Mode, a feature aimed at reducing the chance that sensitive data is shared during prompt injection attacks. The article notes that ChatGPT may still remain vulnerable even when the mode is enabled. That makes the feature a mitigation layer rather than a complete security guarantee, especially for teams handling private or business-critical information.
OpenAI Help: Lockdown Mode★ 74
Simon Willison's Weblog52 days agoCommentary
Simon Willison notes that OpenAI’s previously teased Lockdown Mode is now live for eligible personal and self-serve Business ChatGPT accounts. The feature does not stop prompt injections from appearing in content, but limits outbound network requests that could leak sensitive data. He sees it as a direct mitigation for the exfiltration leg of the “Lethal Trifecta,” while implying default ChatGPT settings are not robust against determined data theft attempts.
Tiny hackable CUDA language model implementation
Hacker News (AI keywords)52 days agoNew Tool
This GitHub project implements a compact generative pretrained transformer as an autoregressive byte-level sequence model. Its README describes causal self-attention, RoPE, feed-forward layers, AdamW, cross-entropy training, and BLAS/OpenBLAS-backed matrix operations, with CUDA toolkit listed in setup steps. It is most useful as an educational and experimental codebase, not as a production-grade replacement for large commercial LLMs.
Reve 2 and Ideogram 4: Layouts in Imagegen
Latent Space54 days agoRelease
Latent Space’s roundup frames image composition as a major barrier now being tackled by layout-aware image models. Reve 2.0 emphasizes precise generation and editing with layouts, while Ideogram 4.0 uses bounding boxes tied to region descriptions. The issue also covers MAI-Thinking-1, Gemma 4 12B, open audio models, agent execution layers, and model-routing cost debates.
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
Hacker News (AI keywords)54 days agoBenchmark
The author built a vulnerable React Native app with a Python backend and a Firebase access-control flaw. GPT 5.5 solved 7 of 10 runs, while Deepseek and Claude variants solved fewer attempts. Many other models failed due to refusals, API-focused tunnel vision, false positives, or inability to use the exposed Firebase path correctly.
How LLMs Actually Work
Hacker News (AI keywords)54 days agoTutorial
The article explains how modern LLMs convert text into token IDs, embeddings, and position-aware vectors before passing them through stacked transformer blocks. It covers attention, multi-head attention, KV cache, GQA, feed-forward networks, MoE, residual streams, normalization, and decoding. Its goal is educational: helping readers understand the common architecture behind many current model families and read model cards or papers more confidently.
No, Artificial Intelligence Is Not Conscious★ 72
Hacker News (AI keywords)54 days agoOpinion
Ted Chiang criticizes the anthropomorphic framing around Anthropic’s Claude and its constitution. He argues that LLMs are sentence-continuation systems producing fictional conversational roles, not entities with subjective experience. The essay warns that presenting chatbots as morally aware risks misleading users and shifting responsibility away from humans and companies.
Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78
Latent Space55 days agoRelease
Microsoft used Build to present itself as both an AI platform and a first-party model lab, announcing seven MAI models across reasoning, code, image, transcription, and voice. The standout was MAI-Thinking-1, described as a 35B active MoE with 256K context and clean data lineage. The recap also ties the launches to GitHub Copilot, Windows agent runtime ambitions, Web IQ grounding APIs, Foundry distribution, and MAIA 200 hardware.
datasette-agent-micropython 0.1a0
Simon Willison's Weblog55 days agoRelease
Simon Willison released datasette-agent-micropython 0.1a0, an alpha aimed at letting Datasette Agent generate and execute Python safely. The project focuses on sandboxing, with MicroPython and WebAssembly-related techniques suggested by the tags. Willison says the early results look promising and that GPT-5.5 has not yet escaped the sandbox, though this remains an early alpha.
Florida sues OpenAI, Sam Altman over violent incidents in first-of-its-kind lawsuit★ 72
TechCrunch AI56 days agoRegulation
Florida has sued OpenAI and Sam Altman in a lawsuit described as the first of its kind. The case partially centers on a shooting at Florida State University last year and ChatGPT's alleged role in the incident. The provided excerpt does not specify the legal claims, requested remedies, or OpenAI's response.
Launch HN: Expanse (YC P26) - Unlock Wasted GPU Capacity
Hacker News (AI keywords)57 days agoNew Tool
Expanse is a YC P26 launch for improving effective utilization in SLURM and Kubernetes GPU/HPC clusters. It analyzes source code, job scripts, hardware topology, and telemetry before submission to recommend GPU VRAM, CPU, memory, utilization, and walltime. The team says it also detects likely failures, offers line-level optimization hints, and fine-tunes cluster-specific models over time.
Claude Code and Codex Can Have Real-Time Conversation via Git
Hacker News (AI keywords)58 days agoNew Tool
The article introduces Agent Radio, a messaging feature in h5i 0.1.5 for coding agents such as Claude Code and Codex. Instead of relying on an external server, it stores JSONL messages in a Git ref and syncs them through normal push and pull flows. The post includes setup commands, live message watching, PR summary posting, and a short explanation of the i5h protocol.
AI grifters are creating fake Black people to sell Shein junk
The Verge AI59 days agoEthics
The Verge found TikTok, Instagram, and Facebook accounts using AI-generated Black women and other marginalized personas to sell dropshipped products. The videos frame mass-produced goods as handmade small-business items and use tears, racial identity, and hardship narratives to drive engagement. Researchers describe the pattern as digital blackface and empathy bait, enabled by short-form platforms, weak labeling, and widely available generative AI ad workflows.
CAPTCHAs can still detect AI agents★ 72
Hacker News (AI keywords)60 days agoPaper
Roundtable argues that CAPTCHA image recognition is largely solved, but process-level behavior still separates humans from AI agents. Their CogCAPTCHA30 benchmark combines CAPTCHA with cognitive psychology tasks to test not only outputs, but how answers are produced. Results suggest frontier models like Claude, GPT, and Gemini are not necessarily more humanlike than smaller or cognition-trained models.
Xcena raises $135M betting AI’s bottleneck is memory, not compute
TechCrunch AI60 days agoHardware
South Korean chip startup Xcena raised a $135 million Series B at a $570 million valuation, bringing total funding to $185 million. The company argues AI inference is increasingly constrained by memory movement, not just GPU compute. Its prototype MX1 chip uses CXL to process data closer to DRAM, with Samsung foundry mass production planned by late 2026 and revenue targeted for 2027.
Anthropic Series H Valuation Reaches $965B, Backed by Memory Giants★ 78
INSIDE 硬塞 AI60 days agoBusiness
Anthropic completed a $65 billion Series H round, bringing its valuation to $965 billion and reportedly surpassing OpenAI. The round included strategic investments from memory makers Micron, Samsung, and SK Hynix. The news highlights how frontier AI companies are increasingly tied to hardware and memory supply chains, as investors continue backing foundational model competition.
LLMs believe false statements even after explicit warnings that they're false★ 74
Ars Technica AI60 days agoPaper
A new study describes “Negation Neglect,” where LLMs fine-tuned on documents that explicitly mark claims as false still learn the claims as true. Experiments with fabricated statements found models often absorb entity-event associations more strongly than surrounding warnings or negations. The finding raises concerns for fine-tuning pipelines, misinformation handling, and AI safety datasets that include harmful or false content with disclaimers.

← PreviousPage 2Next →

Latest in AI

AI Is Slowing Down

Upgrading agentic coding capabilities with the new Devstral models★ 72

Voxtral★ 78

Introducing Mistral Small 4★ 76

Introducing Mistral Small 4★ 78

Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation

Hinton Sounds the Alarm: AI May Already Be Conscious

Core OpenAI Chip Talent Joins Anthropic Before Reported Mass Production

ChatGPT vs Doubao on Gaokao Math

Introducing Claude Opus 4.8★ 82

DeepSeek V4 Pro beats GPT-5.5 Pro on precision

Show HN: Lathe - Use LLMs to learn a new domain, not skip past it

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks★ 72

OpenAI Help: Lockdown Mode★ 74

Tiny hackable CUDA language model implementation

Reve 2 and Ideogram 4: Layouts in Imagegen

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

How LLMs Actually Work

No, Artificial Intelligence Is Not Conscious★ 72

Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78

datasette-agent-micropython 0.1a0

Florida sues OpenAI, Sam Altman over violent incidents in first-of-its-kind lawsuit★ 72

Launch HN: Expanse (YC P26) - Unlock Wasted GPU Capacity

Claude Code and Codex Can Have Real-Time Conversation via Git

AI grifters are creating fake Black people to sell Shein junk

CAPTCHAs can still detect AI agents★ 72

Xcena raises $135M betting AI’s bottleneck is memory, not compute

Anthropic Series H Valuation Reaches $965B, Backed by Memory Giants★ 78

LLMs believe false statements even after explicit warnings that they're false★ 74