Latest in AI

Showing:safetyResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

From Jailbreaking to Vibe Hacking: AI Security Shifts to "Psychocybersecurity"
INSIDE 硬塞 AI64 days agoEthics
AI security is shifting from technical jailbreaks to "Vibe Hacking," where attackers use social engineering and psychological tactics to manipulate an LLM's simulated persona. By exploiting the model's behavioral tendencies rather than code vulnerabilities, this trend establishes "psychocybersecurity" as a critical new frontier for AI alignment and safety.
Import AI 438：無聲的警報，為我們所有人閃爍（網路安全能力過剩與對話隱私）★ 75
Import AI (Jack Clark)218 days agoCommentary
In this issue of Import AI 438, Jack Clark examines two key issues concerning AI security and privacy: **1. You Are Your LLM History** As large language models…
Google DeepMind 強化其「前沿安全框架」(Frontier Safety Framework)，以應對先進 AI 模型的嚴重風險★ 75
Google DeepMind Blog277 days agoRelease
Google DeepMind has recently announced the strengthening of its Frontier Safety Framework (FSF) — a systematic mechanism designed to proactively identify…
Llama Guard 4 正式登陸 Hugging Face Hub：全新一代開源 AI 安全防護模型★ 75
Hugging Face Blog455 days agoRelease
Meta's safety guardrail model family has welcomed its newest member — Llama Guard 4 — which is now officially available on the Hugging Face Hub. As a…
AI Agent 時代已來臨：我們該如何應對？（Hugging Face 倫理與社會專欄）★ 75
Hugging Face Blog561 days agoCommentary
With the explosion of AI Agent technology, AI is no longer just a passive chatbot that answers questions — it has become an entity capable of autonomously…
Google 發布 Gemma 2 2B、安全分類器 ShieldGemma 與可解釋性工具 Gemma Scope★ 85
Hugging Face Blog727 days agoRelease
Google released a major update to the Gemma 2 family in late July 2024, comprising three core components: 1. **Gemma 2 2B**: A lightweight model with just 2.6B…
Hugging Face 推出 AI Secure LLM 安全排行榜：基於 DecodingTrust 框架深度評估大模型信任度★ 75
Hugging Face Blog914 days agoRelease
### Introduction: Capability Is Not Safety — A New Benchmark for LLM Safety Evaluation As large language models (LLMs) are adopted more deeply across…
Hugging Face 發表開發 Diffusers 函式庫的倫理指南
Hugging Face Blog1,244 days agoOpinion
With the explosion of generative AI models like Stable Diffusion, Hugging Face's Diffusers library has become the go-to tool for developers deploying and…