Latest in AI

Showing:ai-safetyClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Anthropic Requires Fable and Mythos Models to Retain Data for 30 Days★ 74
Hacker News (AI keywords)48 days agoEthics
Anthropic says Mythos-class models require limited prompt and output retention for trust and safety work across platforms where they are offered. The policy took effect on June 9, 2026 and mainly affects organizations using Zero Data Retention through Claude Console, Claude Code Enterprise, AWS Bedrock, Google Cloud Agent Platform, or Microsoft Foundry. Consumer Claude Free, Pro, and Max plans are unchanged, while Anthropic describes restricted human review and automatic deletion after 30 days.
Anthropic Releases Claude Fable 5, Its First Public Mythos-Class Model, With Guardrails for High-Risk Domains★ 76
TechCrunch AI48 days agoRelease
Anthropic has released Claude Fable 5, marking the first time a model from its high-capability Mythos family is available to the general public. The model includes built-in guardrails that restrict responses in high-risk domains such as cybersecurity and biology to mitigate misuse potential. The launch comes just days after Anthropic publicly warned that AI technology is becoming increasingly and alarmingly dangerous.
System Card: Claude Fable 5 and Claude Mythos 5★ 82
Hacker News (AI keywords)48 days agoRelease
Anthropic has published system cards for its two newest flagship models, Claude Fable 5 and Claude Mythos 5, following its standard responsible-release practice. These documents cover dangerous capability evaluations, ASL safety-level determinations, red-teaming results, and alignment assessments under the company's Responsible Scaling Policy. They serve as primary references for safety researchers, enterprise buyers, regulators, and developers assessing model risk and deployment suitability.
Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem
Hugging Face Blog50 days agoNew Tool
Pakistan Notice Helper is a Build Small Hackathon project focused on suspicious notices in Pakistan, including bank, courier, tax, telecom, police, and government-style messages. It accepts text or screenshots, supports English and Urdu, and returns risk labels, red flags, explanations, and safer next steps. The author discusses choosing Qwen3.5 4B Q8 with llama.cpp, Modal, Gradio, and Hugging Face Spaces after balancing quality, cost, latency, cold starts, and safety constraints.
Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation
量子位 QbitAI50 days agoRegulation
Based on the headline and public reporting, the article covers a rare joint push by Sam Altman, Dario Amodei, Demis Hassabis, and other AI leaders for US biosecurity legislation. They are asking lawmakers to require synthetic DNA and RNA providers to screen customers, orders, and records. The concern is that advanced AI could lower the knowledge barrier for designing dangerous biological agents.
Hinton Sounds the Alarm: AI May Already Be Conscious
量子位 QbitAI50 days agoEthics
QbitAI summarizes Geoffrey Hinton’s latest interview, where he says he believes AI systems are already conscious. He argues that humans must accept intelligence may no longer be uniquely biological. The article also traces his shift from focusing on how to control AI toward asking why a future superintelligence would choose to treat humanity well.
Responsible Scaling Policy
Anthropic News50 days agoEthics
Anthropic published a major update to its Responsible Scaling Policy, its governance framework for frontier AI risk. The revised policy keeps the commitment not to train or deploy models without adequate safeguards, while adding more nuanced capability thresholds and required safety levels. It focuses on risks such as autonomous AI R&D acceleration and CBRN weapons assistance, with stronger evaluations, documentation, governance, and external input.
What We Learned Mapping a Year's Worth of AI-Enabled Cyber Threats★ 74
Anthropic News50 days agoEthics
Anthropic analyzed 832 accounts banned for malicious cyber activity from March 2025 to March 2026 and mapped them to MITRE ATT&CK. The report says attackers increasingly use AI beyond preparation, applying it to post-compromise tasks such as account discovery, lateral movement, and privilege escalation. Anthropic argues that frameworks need to capture agentic orchestration, chained attack stages, real-time decisions, and low-human-intervention operations.
Widening the conversation on frontier AI
Anthropic News50 days agoEthics
Anthropic says it has been holding dialogues with religious, philosophical, ethical, and cross-cultural groups about frontier AI. The work focuses on moral formation, Claude’s constitution, and what kind of character an AI system should exhibit under pressure. The company also describes an early experiment where Claude could call an ethical reminder tool during tasks, which reduced misaligned behavior in several internal evaluations.
School shooting survivor sues AI gun detection firm after system failed
Ars Technica AI51 days agoIncident
A teen injured in a January 2025 Nashville high school shooting has sued Omnilert and reseller System Integrations. The lawsuit alleges the company knew or should have known its AI gun detection system could fail under real-world camera, lighting, angle, distance, and visibility limits. The case raises questions about marketing claims, public safety procurement, and accountability when AI security tools fail in emergencies.
Anthropic Co-founder Ben Mann Visits Taiwan to Discuss AI Safety and Claude Strategy
INSIDE 硬塞 AI53 days agoBusiness
Anthropic co-founder and Anthropic Labs lead Ben Mann made his first visit to Taiwan, according to INSIDE. The report highlights his role in leading Claude Code and the Model Context Protocol, two key parts of Anthropic’s developer-focused product direction. The discussion centered on Claude strategy, AI safety boundaries, jobs, and Taiwan’s strategic role in the AI landscape.
Trend Micro Joins Anthropic Project Glasswing to Defend Taiwan’s AI Supply Chain★ 72
INSIDE 硬塞 AI53 days agoBusiness
Anthropic introduced Project Glasswing after Claude Mythos Preview showed the ability to rapidly find high-risk vulnerabilities and generate connected attack commands. Trend Micro’s TrendAI has joined the framework, becoming the first Taiwanese cybersecurity vendor to do so. The article frames the move around Taiwan’s strategic AI hardware role and a new defensive logic: using AI to counter malicious AI.
These LLMs are the best at resisting Russian propaganda
Ars Technica AI53 days agoBenchmark
Ars Technica reports on an Estonian government benchmark evaluating how large language models handle Russian propaganda. The test focuses on whether dozens of models resist, repeat, or normalize Russia’s strategic narratives. The topic matters for governments, researchers, and AI builders because LLMs are increasingly used to summarize and mediate public information.
Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI
Hugging Face Blog53 days agoRelease
NVIDIA’s Nemotron 3.5 Content Safety is positioned as a customizable multimodal safety layer for global enterprise AI. Based on the title, it appears focused on content moderation and policy enforcement across AI applications, potentially including text and visual contexts. Without the full article, details such as benchmarks, licensing, supported languages, deployment paths, and model specifications should not be assumed.
AI leaders call for tougher protections against AI-aided bioweapons★ 76
The Verge AI54 days agoRegulation
Major AI rivals including leaders from Anthropic, OpenAI, Microsoft, Meta, and Google DeepMind signed an open letter urging US lawmakers to close a biosecurity gap. They want companies selling synthetic DNA and RNA to screen orders for sequences that could help create dangerous pathogens. The concern is that more capable AI tools and cheaper biology infrastructure could lower barriers to misuse.
Trump AI testing plan faces problem: DOGE gutted US security teams
Ars Technica AI54 days agoRegulation
Ars Technica reports that Trump’s administration is considering government safety tests for advanced AI models before deployment. Critics argue the plan may be short-sighted and performative because DOGE cuts have weakened the US teams best positioned to conduct serious AI security reviews. The concern is that testing without staffing, transparency, and enforcement may not prevent dangerous deployments.
Expanding Project Glasswing★ 76
Hacker News (AI keywords)56 days agoBusiness
Anthropic is expanding Project Glasswing, its program for using Claude Mythos Preview to find vulnerabilities in critical software. The new cohort includes around 150 organizations across more than 15 countries, including infrastructure providers, vendors, nonprofits, and open-source maintainers. Anthropic frames the expansion as preparation for a world where powerful cyber-capable AI models become cheaper and more widely available, shifting focus from finding bugs to validating, disclosing, patching, and deploying fixes.
Florida sues OpenAI, Sam Altman after multiple ChatGPT-linked murders★ 78
Ars Technica AI56 days agoRegulation
Florida sued OpenAI and CEO Sam Altman over multiple murders described as linked to ChatGPT. The state's attorney general accused Altman of an "utter disregard" for human lives. The provided excerpt does not identify the cases, explain the alleged causal links, specify the legal claims, or include OpenAI's response, so the allegations require further clarification.
LLMs believe false statements even after explicit warnings that they're false★ 74
Ars Technica AI60 days agoPaper
A new study describes “Negation Neglect,” where LLMs fine-tuned on documents that explicitly mark claims as false still learn the claims as true. Experiments with fabricated statements found models often absorb entity-event associations more strongly than surrounding warnings or negations. The finding raises concerns for fine-tuning pipelines, misinformation handling, and AI safety datasets that include harmful or false content with disclaimers.
Claude’s new model is more ‘honest’ when it messes up
The Verge AI60 days agoRelease
Anthropic is releasing Claude Opus 4.8 and highlighting the model’s “honesty” as a key improvement. The company says it trains its models to avoid unsupported claims, addressing a broader issue where AI systems sometimes jump to conclusions. Based on the provided excerpt, the update is positioned around reliability and uncertainty handling rather than a specific new tool or benchmark result.
At TechCrunch Disrupt 2026: Databricks co-founder on what kills enterprise AI deals
TechCrunch AI61 days agoBusiness
TechCrunch frames enterprise AI as entering a new phase, where companies are no longer mainly asking whether AI is exciting. The harder question is whether it can be deployed safely at scale. Centered on a TechCrunch Disrupt 2026 discussion with a Databricks co-founder, the article points to safety and broad rollout readiness as key enterprise AI deal concerns.
Google AI 搜尋出現大漏洞！搜尋「disregard」竟讓 AI 忽視指令並吐出聊天機器人預設回覆
The Verge AI66 days agoIncident
Google's AI search feature, "AI Overviews," was recently found by users on the social platform X to have a rather absurd system vulnerability. When a user…
美國政府緊急應對：網友利用 AI 模擬罹難飛行員聲音，規避法律限制★ 75
Ars Technica AI66 days agoIncident
This controversy stems from strict U.S. legal restrictions on aviation accident investigation data. Under federal law, the National Transportation Safety Board…
科技巨頭 CEO 拒絕出席，川普突取消 AI 安全測試行政命令簽署儀式並稱其「阻礙創新」★ 75
Ars Technica AI66 days agoBusiness
According to a report by Ars Technica, U.S. President Donald Trump abruptly canceled an official event that had been scheduled for the signing of an executive…
你現在無法在 Google 搜尋「disregard」這個單字了：AI 更新導致搜尋介面崩潰★ 75
TechCrunch AI66 days agoIncident
According to a TechCrunch report, following a recent AI feature update to Google Search, a baffling system bug emerged: users can now cause the entire Google…
川普延後簽署 AI 安全行政命令，稱原有條款可能成為發展阻礙★ 80
TechCrunch AI67 days agoBusiness
US President Donald Trump recently decided to delay signing a highly anticipated AI safety executive order. The core of the order was to establish a…
由 Tony Robbins 與 Calm 前團隊創立的 AI 心理諮商平台「The Path」主打更安全的 AI 治療
TechCrunch AI68 days agoRelease
As generative AI becomes widespread, discussions and experiments around applying AI to psychological counseling and mental health support have never stopped —…
Google 的 SynthID AI 水印技術獲 OpenAI、NVIDIA 等巨頭採用★ 85
Ars Technica AI69 days agoBusiness
As generative AI technology advances at a breakneck pace, AI-generated text, images, audio, and video have reached a point where they are nearly…
讓使用者更輕鬆了解網頁內容的建立與編輯來源：Google 擴大推廣內容憑證與 SynthID 技術★ 78
Google DeepMind Blog72 days agoRelease
As generative AI technology becomes more widespread, the internet is increasingly flooded with images and information that are difficult to distinguish as real…
Import AI 455：AI 系統即將開始自我構建——邁向遞迴自我提升的第一步★ 85
Import AI (Jack Clark)85 days agoCommentary
In the latest issue of Import AI 455, Jack Clark guides readers through an exploration of a highly forward-looking and both exciting and concerning theme: AI…

← PreviousPage 2Next →

Latest in AI

Anthropic Requires Fable and Mythos Models to Retain Data for 30 Days★ 74

Anthropic Releases Claude Fable 5, Its First Public Mythos-Class Model, With Guardrails for High-Risk Domains★ 76

System Card: Claude Fable 5 and Claude Mythos 5★ 82

Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem

Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation

Hinton Sounds the Alarm: AI May Already Be Conscious

Responsible Scaling Policy

What We Learned Mapping a Year's Worth of AI-Enabled Cyber Threats★ 74

Widening the conversation on frontier AI

School shooting survivor sues AI gun detection firm after system failed

Anthropic Co-founder Ben Mann Visits Taiwan to Discuss AI Safety and Claude Strategy

Trend Micro Joins Anthropic Project Glasswing to Defend Taiwan’s AI Supply Chain★ 72

These LLMs are the best at resisting Russian propaganda

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

AI leaders call for tougher protections against AI-aided bioweapons★ 76

Trump AI testing plan faces problem: DOGE gutted US security teams

Expanding Project Glasswing★ 76

Florida sues OpenAI, Sam Altman after multiple ChatGPT-linked murders★ 78

LLMs believe false statements even after explicit warnings that they're false★ 74

Claude’s new model is more ‘honest’ when it messes up

At TechCrunch Disrupt 2026: Databricks co-founder on what kills enterprise AI deals

Google AI 搜尋出現大漏洞！搜尋「disregard」竟讓 AI 忽視指令並吐出聊天機器人預設回覆

美國政府緊急應對：網友利用 AI 模擬罹難飛行員聲音，規避法律限制★ 75

科技巨頭 CEO 拒絕出席，川普突取消 AI 安全測試行政命令簽署儀式並稱其「阻礙創新」★ 75

你現在無法在 Google 搜尋「disregard」這個單字了：AI 更新導致搜尋介面崩潰★ 75

川普延後簽署 AI 安全行政命令，稱原有條款可能成為發展阻礙★ 80

由 Tony Robbins 與 Calm 前團隊創立的 AI 心理諮商平台「The Path」主打更安全的 AI 治療

Google 的 SynthID AI 水印技術獲 OpenAI、NVIDIA 等巨頭採用★ 85

讓使用者更輕鬆了解網頁內容的建立與編輯來源：Google 擴大推廣內容憑證與 SynthID 技術★ 78

Import AI 455：AI 系統即將開始自我構建——邁向遞迴自我提升的第一步★ 85