Hugging Face BlogFeb 10, 2025, 4:10 PMimportant 85

Open R1 更新第二彈：Hugging Face 複製 DeepSeek-R1 的最新進展與強化學習實踐

Original: Open R1: Update #2

Hugging Face has officially published the second technical update (Update #2) for the Open R1 project, which aims to replicate…

Hugging Face 釋出 Open R1 專案的第二份進度報告。團隊分享了使用 TRL 庫中的 GRPO 演算法在 Llama-8B 與 Qwen-32B 上進行強化學習（RL）訓練的實戰經驗，成功重現了「頓悟時刻」與推理鏈。本次更新也詳細探討了格式控制、訓練穩定性及在 MATH、AIME 等基準測試上的最新評估數據。

Hugging Face has officially published the second technical update (Update #2) for the Open R1 project, which aims to replicate DeepSeek-R1's reasoning model technology in a fully open-source and reproducible manner.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

llama open-source other huggingface #reasoning #rlhf #grpo #deepseek-r1 #open-r1

Summaries are AI-generated; the original article is authoritative.