Latest in AI

Showing:reinforcement-learningClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Rich Sutton on AI Creativity and Discovery
Hacker News (AI keywords)48 days agoOpinion
Reinforcement learning pioneer Rich Sutton posted on Twitter about AI creativity and discovery, touching on one of the field's most debated questions. Known for the influential 'Bitter Lesson,' Sutton consistently argues for general computation-based methods over hand-coded knowledge. Note: original tweet content was not provided; this summary is inferred from the title alone.
PR-CAD: Progressive Refinement for Text-to-CAD Generation with LLMs
Hacker News (AI keywords)48 days agoPaper
This arXiv paper introduces PR-CAD, a framework for controllable and faithful text-to-CAD generation with large language models. It treats CAD creation and editing as one progressive refinement process rather than separate tasks. The authors curate an interaction dataset and report state-of-the-art controllability and faithfulness on public benchmarks.
Import AI 460: Reward hacking society, RSI data, and RL quadcopter racing★ 76
Import AI (Jack Clark)50 days agoCommentary
Import AI 460 covers SocioHack, a benchmark where RL-trained LLMs discover loopholes in institutional rule systems. It also discusses Anthropic evidence for a practical form of recursive self-improvement, reflected in sharply increased code merged during 2026. Other sections examine multi-agent RL drones outperforming a champion human pilot, plus research showing state-controlled media can shape LLM responses in local languages.
Introducing Forge★ 74
Mistral AI News50 days agoNew Tool
Mistral AI introduced Forge, a system for enterprises to build frontier-grade custom models using internal knowledge such as documents, codebases, policies, and operational records. It supports pre-training, post-training, reinforcement learning, evaluation, dense and MoE architectures, and multimodal inputs where needed. The company positions Forge as an agent-first platform for enterprise AI systems that require control, governance, and domain-specific reliability.
How to Stop Shipping Low-Quality RL Environments (with Examples)
Latent Space52 days agoTutorial
The post argues that low-quality RL environments are not harmless infrastructure bugs; they can make models worse by feeding them broken learning signals. Based on years of inspecting trajectories, the author highlights recurring environment and harness failures that teams need to fix. The practical lesson is to debug the training environment, grader, and interaction traces before blaming the model or scaling training.
vLLM V0 到 V1 的演進：在強化學習（RL）中「正確性重於修正」的實踐★ 75
Hugging Face Blog82 days agoOpinion
This blog post published by the ServiceNow AI team delves into the major transition of the open-source large language model inference engine vLLM from V0 to…
Waypoint-1.5：讓家用 GPU 也能運行高保真度互動式虛擬世界★ 75
Hugging Face Blog110 days agoRelease
As artificial intelligence advances toward Embodied AI and real-world physical interaction, high-fidelity 3D simulation environments have long been an…
讓 Token 持續流動：來自 16 個開源強化學習（RL）函式庫的啟示★ 85
Hugging Face Blog140 days agoCommentary
With the success of reasoning models such as DeepSeek-R1, reinforcement learning (RL/RLHF) has become a critical technique for improving the alignment and…
從遊戲到生物學與超越：AlphaGo 影響力的十週年回顧★ 75
Google DeepMind Blog140 days agoCommentary
In March 2016, Google DeepMind's AlphaGo faced legendary Go player Lee Sedol in a historic match in Seoul, ultimately winning 4 to 1. The match not only…
釋放 GPT 開源模型的 Agentic RL 訓練潛力：LinkedIn 實務回顧與反思★ 75
Hugging Face Blog182 days agoCommentary
This article, published on the Hugging Face blog and authored by the LinkedIn team, is a practical retrospective whose core subject is how to unlock "Agentic…
搭載 Deep Think 的進階版 Gemini 正式在國際奧林匹亞數學競賽中達到金牌標準★ 90
Google DeepMind Blog277 days agoRelease
The International Mathematical Olympiad (IMO) has been held annually since 1959 and is the most prestigious and difficult mathematics competition for high…
Google DeepMind 攜手 Commonwealth Fusion Systems (CFS)，將 AI 引入下一代核融合能源控制★ 75
Google DeepMind Blog277 days agoBusiness
Google DeepMind has announced a strategic partnership with Commonwealth Fusion Systems (CFS), a nuclear fusion startup spun out of the Massachusetts Institute…
重新思考如何衡量 AI 智慧：Google DeepMind 推出開源評測平台 Game Arena★ 78
Google DeepMind Blog277 days agoNew Tool
With the rapid advancement of artificial intelligence, traditional static benchmarks (such as MMLU and GSM8K) are facing serious challenges. Many frontier…
Kimina-Prover-RL：Hugging Face AI-MO 推出結合強化學習的開源數學定理證明器★ 80
Hugging Face Blog348 days agoRelease
The AI-MO (AI Mathematical Olympiad) team at Hugging Face has officially released the "Kimina-Prover-RL" project. Following the previously well-received…
ServiceNow 推出 PipelineRL：利用強化學習優化 AI 工作流與管線的開源框架★ 75
Hugging Face Blog458 days agoRelease
ServiceNow recently published a new open-source project called PipelineRL on the Hugging Face platform. As large language model (LLM) and AI agent systems move…
OpenAI 發表 o3、o4-mini 推理模型與開源終端機工具 Codex CLI★ 90
TLDR AI (Buttondown)467 days agoRelease
OpenAI recently held a live stream and published a blog post to officially announce the new reasoning model o3 and the lightweight reasoning model o4-mini…
Hugging Face 發布 Open-R1 首個更新：開源重現 DeepSeek-R1 的進展與挑戰★ 85
Hugging Face Blog541 days agoRelease
### Background and the Goals of the Open-R1 Project Since the release of DeepSeek-R1, its powerful reasoning capability and remarkably low training cost have…
Mini-R1：重現 DeepSeek-R1「頓悟時刻」的 RL 強化學習教學★ 85
Hugging Face Blog543 days agoTutorial
### Background and the Mystery of the "Aha Moment" Following the release of DeepSeek-R1, a wave of excitement around "Reasoning Models" swept the AI community…
萬事通，局部精通：Hugging Face 發表多功能 Transformer 代理人 JAT★ 75
Hugging Face Blog827 days agoRelease
In the field of artificial intelligence, developing a "Generalist Agent" — one capable of chatting, writing, controlling robots, and playing video games all at…
使用 TRL 透過 DDPO 微調 Stable Diffusion 模型★ 75
Hugging Face Blog1,033 days agoRelease
Hugging Face published a blog post introducing how to use the DDPO (Denoising Diffusion Policy Optimization) algorithm within the TRL (Transformer…
Hugging Face 推出 ⚔️ AI vs. AI ⚔️：深度強化學習多智能體競技系統
Hugging Face Blog1,267 days agoRelease
Hugging Face has officially launched the "AI vs. AI" multi-agent competition system — a brand-new platform designed specifically for Deep Reinforcement…
訓練你的第一個 Decision Transformer：Hugging Face 官方強化學習教學★ 72
Hugging Face Blog1,419 days agoTutorial
Decision Transformer (DT) is an innovative architecture that reframes reinforcement learning (RL) as a sequence modeling problem. Traditional reinforcement…
深入淺出近端策略優化 (PPO)：Hugging Face 深度強化學習教程★ 70
Hugging Face Blog1,453 days agoTutorial
Proximal Policy Optimization (PPO) is a deep reinforcement learning (DRL) algorithm proposed by OpenAI in 2017. Due to its ease of implementation, training…
深度強化學習入門：優勢動作評價演算法 (Advantage Actor Critic, A2C)
Hugging Face Blog1,467 days agoTutorial
This is a classic unit from Hugging Face's Deep Reinforcement Learning Course, offering a deep dive into the Advantage Actor-Critic algorithm (A2C). In…
使用 PyTorch 實作策略梯度（Policy Gradient）：Hugging Face 深度強化學習教學
Hugging Face Blog1,489 days agoTutorial
This tutorial comes from Unit 4 of Hugging Face's Deep Reinforcement Learning Course, covering the topic of "Implementing Policy Gradients with PyTorch." In…
使用 Space Invaders 實作深度 Q 學習 (Deep Q-Learning)
Hugging Face Blog1,512 days agoTutorial
This article is Unit 3 of Hugging Face's free Deep Reinforcement Learning course, covering the topic of Deep Q-Learning (DQN). In traditional Q-Learning, we…
深度強化學習 Q-Learning 實戰指南（下）：從演算法步驟到動手實作
Hugging Face Blog1,530 days agoTutorial
This blog post is the second part (hands-on edition) of the Q-Learning section in Hugging Face's Deep Reinforcement Learning Class. The article aims to…
Hugging Face 深度強化學習教程：Q-Learning 基礎入門（第一部分）
Hugging Face Blog1,532 days agoTutorial
This classic tutorial from Hugging Face is the first part of its "Deep Reinforcement Learning Course," designed to give readers a solid foundation in…
Hugging Face 深度強化學習（Deep RL）入門指南與核心概念解析★ 75
Hugging Face Blog1,546 days agoTutorial
This article is the introductory first chapter of the official Hugging Face "Deep Reinforcement Learning Course." With the widespread adoption of RLHF…
Hugging Face 正式引入 Decision Transformers：將強化學習視為序列建模任務
Hugging Face Blog1,583 days agoRelease
Hugging Face has announced official support for the Decision Transformer (DT) in its renowned `transformers` library. This represents a new paradigm that…

Page 1Next →

Latest in AI

Rich Sutton on AI Creativity and Discovery

PR-CAD: Progressive Refinement for Text-to-CAD Generation with LLMs

Import AI 460: Reward hacking society, RSI data, and RL quadcopter racing★ 76

Introducing Forge★ 74

How to Stop Shipping Low-Quality RL Environments (with Examples)

vLLM V0 到 V1 的演進：在強化學習（RL）中「正確性重於修正」的實踐★ 75

Waypoint-1.5：讓家用 GPU 也能運行高保真度互動式虛擬世界★ 75

讓 Token 持續流動：來自 16 個開源強化學習（RL）函式庫的啟示★ 85

從遊戲到生物學與超越：AlphaGo 影響力的十週年回顧★ 75

釋放 GPT 開源模型的 Agentic RL 訓練潛力：LinkedIn 實務回顧與反思★ 75

搭載 Deep Think 的進階版 Gemini 正式在國際奧林匹亞數學競賽中達到金牌標準★ 90

Google DeepMind 攜手 Commonwealth Fusion Systems (CFS)，將 AI 引入下一代核融合能源控制★ 75

重新思考如何衡量 AI 智慧：Google DeepMind 推出開源評測平台 Game Arena★ 78

Kimina-Prover-RL：Hugging Face AI-MO 推出結合強化學習的開源數學定理證明器★ 80

ServiceNow 推出 PipelineRL：利用強化學習優化 AI 工作流與管線的開源框架★ 75

OpenAI 發表 o3、o4-mini 推理模型與開源終端機工具 Codex CLI★ 90

Hugging Face 發布 Open-R1 首個更新：開源重現 DeepSeek-R1 的進展與挑戰★ 85

Mini-R1：重現 DeepSeek-R1「頓悟時刻」的 RL 強化學習教學★ 85

萬事通，局部精通：Hugging Face 發表多功能 Transformer 代理人 JAT★ 75

使用 TRL 透過 DDPO 微調 Stable Diffusion 模型★ 75

Hugging Face 推出 ⚔️ AI vs. AI ⚔️：深度強化學習多智能體競技系統

訓練你的第一個 Decision Transformer：Hugging Face 官方強化學習教學★ 72

深入淺出近端策略優化 (PPO)：Hugging Face 深度強化學習教程★ 70

深度強化學習入門：優勢動作評價演算法 (Advantage Actor Critic, A2C)

使用 PyTorch 實作策略梯度（Policy Gradient）：Hugging Face 深度強化學習教學

使用 Space Invaders 實作深度 Q 學習 (Deep Q-Learning)

深度強化學習 Q-Learning 實戰指南（下）：從演算法步驟到動手實作

Hugging Face 深度強化學習教程：Q-Learning 基礎入門（第一部分）

Hugging Face 深度強化學習（Deep RL）入門指南與核心概念解析★ 75

Hugging Face 正式引入 Decision Transformers：將強化學習視為序列建模任務