Latest in AI

Showing:reinforcement-learningGeneralClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Import AI 460: Reward hacking society, RSI data, and RL quadcopter racing★ 76
Import AI (Jack Clark)50 days agoCommentary
Import AI 460 covers SocioHack, a benchmark where RL-trained LLMs discover loopholes in institutional rule systems. It also discusses Anthropic evidence for a practical form of recursive self-improvement, reflected in sharply increased code merged during 2026. Other sections examine multi-agent RL drones outperforming a champion human pilot, plus research showing state-controlled media can shape LLM responses in local languages.
從遊戲到生物學與超越：AlphaGo 影響力的十週年回顧★ 75
Google DeepMind Blog141 days agoCommentary
In March 2016, Google DeepMind's AlphaGo faced legendary Go player Lee Sedol in a historic match in Seoul, ultimately winning 4 to 1. The match not only…
搭載 Deep Think 的進階版 Gemini 正式在國際奧林匹亞數學競賽中達到金牌標準★ 90
Google DeepMind Blog277 days agoRelease
The International Mathematical Olympiad (IMO) has been held annually since 1959 and is the most prestigious and difficult mathematics competition for high…
Google DeepMind 攜手 Commonwealth Fusion Systems (CFS)，將 AI 引入下一代核融合能源控制★ 75
Google DeepMind Blog277 days agoBusiness
Google DeepMind has announced a strategic partnership with Commonwealth Fusion Systems (CFS), a nuclear fusion startup spun out of the Massachusetts Institute…
OpenAI 發表 o3、o4-mini 推理模型與開源終端機工具 Codex CLI★ 90
TLDR AI (Buttondown)467 days agoRelease
OpenAI recently held a live stream and published a blog post to officially announce the new reasoning model o3 and the lightweight reasoning model o4-mini…
Hugging Face 發布 Open-R1 首個更新：開源重現 DeepSeek-R1 的進展與挑戰★ 85
Hugging Face Blog541 days agoRelease
### Background and the Goals of the Open-R1 Project Since the release of DeepSeek-R1, its powerful reasoning capability and remarkably low training cost have…