Latest in AI

Showing:rlhfDevelopersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Frontier Post-Training Recipe Review with Finbarr Timbers
Interconnects (Nathan L.)42 days agoCommentary
In the 18th installment of his interview series, Interconnects author Nathan Lambert speaks with Finbarr Timbers about the post-training techniques used at frontier AI labs. The conversation examines the methodologies — including supervised fine-tuning, reinforcement learning from human feedback, and preference optimization — that shape model behavior after pretraining. The discussion offers a practitioner's perspective on the evolving landscape of alignment and capability tuning at scale.
vLLM V0 到 V1 的演進：在強化學習（RL）中「正確性重於修正」的實踐★ 75
Hugging Face Blog82 days agoOpinion
This blog post published by the ServiceNow AI team delves into the major transition of the open-source large language model inference engine vLLM from V0 to…
解讀當前開源與閉源 AI 模型的性能差距：超越單一評估指標的迷思★ 75
Interconnects (Nathan L.)98 days agoOpinion
In today's AI landscape, the performance gap between open-weights models (such as Meta's Llama family) and closed-source models (such as OpenAI's GPT and…
Ecom-RLVE：為電商對話 Agent 打造的自適應可驗證強化學習環境★ 75
Hugging Face Blog103 days agoRelease
As large language models (LLMs) become increasingly widespread, more and more companies are attempting to deploy AI agents in e-commerce customer service and…
Nathan Lambert 的最新進展：ATOM Report、Post-Training 課程、新書與持續進行的 AI 研究★ 70
Interconnects (Nathan L.)104 days agoRelease
Nathan Lambert, a prominent AI expert, former Alignment Scientist at Hugging Face, and founder of the popular newsletter Interconnects, recently wrote about…
Hugging Face 發表 TRL v1.0：專為後訓練（Post-Training）打造的開源庫，邁向 API 穩定與高效對齊新里程碑★ 85
Hugging Face Blog119 days agoRelease
Hugging Face has officially announced the release of TRL (Transformer Reinforcement Learning) v1.0. This is a major milestone, marking TRL's transformation…
損耗性自我提升：為什麼 AI 自我改進是真的，但不會導致「急遽暴漲」★ 75
Interconnects (Nathan L.)127 days agoOpinion
This article takes a deep dive into one of the most contentious topics in artificial intelligence: AI "self-improvement" and whether it will trigger a "fast…
讓 Token 持續流動：來自 16 個開源強化學習（RL）函式庫的啟示★ 85
Hugging Face Blog140 days agoCommentary
With the success of reasoning models such as DeepSeek-R1, reinforcement learning (RL/RLHF) has become a critical technique for improving the alignment and…
使用 RapidFire AI 讓 Hugging Face TRL 微調速度提升 20 倍★ 80
Hugging Face Blog249 days agoRelease
The Hugging Face official blog has announced a collaboration with RapidFire AI, bringing a revolutionary performance improvement to its popular TRL…
重新思考 Agent 的泛化能力：MiniMax M2 探討「我們究竟在對齊什麼？」★ 75
Hugging Face Blog271 days agoOpinion
This article, published on the Hugging Face Blog, explores one of the most cutting-edge topics in the AI field today: **the challenges of alignment and…
讓 GPU 毫無閒置：利用 TRL 中協同部署的 vLLM 解鎖高效能強化學習訓練★ 85
Hugging Face Blog420 days agoRelease
In the reinforcement learning from human feedback (RLHF) training process for large language models — whether PPO or the recently popular GRPO — there are…
🐯 Liger GRPO 攜手 TRL：大幅降低 DeepSeek-R1 式強化學習訓練顯存與加速★ 82
Hugging Face Blog429 days agoNew Tool
Since the explosive rise of DeepSeek-R1, GRPO (Group Relative Policy Optimization) has become the most widely discussed reinforcement learning (RL) technique…
Hugging Face 發布 Open R1 第四次更新：開源推理模型訓練的最新進展與最佳化★ 85
Hugging Face Blog488 days agoRelease
Hugging Face's Open R1 project aims to fully open-source and replicate the training pipeline of DeepSeek-R1's reasoning model. In the latest fourth update…
Open R1 第三次更新：Hugging Face 釋出開源推理模型與 GRPO 訓練優化細節★ 85
Hugging Face Blog503 days agoRelease
Since its launch, Hugging Face's Open R1 project has been dedicated to replicating the reasoning capabilities of DeepSeek-R1 in a fully open-source manner. In…
Open R1 更新第二彈：Hugging Face 複製 DeepSeek-R1 的最新進展與強化學習實踐★ 85
Hugging Face Blog533 days agoRelease
Hugging Face has officially published the second technical update (Update #2) for the Open R1 project, which aims to replicate DeepSeek-R1's reasoning model…
Open-R1：Hugging Face 推出完全開源的 DeepSeek-R1 重現計劃★ 90
Hugging Face Blog546 days agoRelease
### Project Background: Recreating the Open-Source Miracle of DeepSeek-R1 The emergence of DeepSeek-R1 sent shockwaves through the global AI community…
Hugging Face 社群推出用於文字生成圖像的開源偏好資料集 (Open Preference Dataset)★ 75
Hugging Face Blog596 days agoRelease
### Introduction: An Important Piece of the Open-Source Image Generation Puzzle As text-to-image (T2I) technology advances rapidly, ensuring that AI-generated…
Argilla 2.4 發布：在 Hugging Face Hub 上免程式碼輕鬆構建微調與評估數據集★ 75
Hugging Face Blog631 days agoRelease
The open-source data curation and annotation platform Argilla has officially released version 2.4, with the core of this update being deep integration with…
Hugging Face「Data Is Better Together」社群數據協作計劃：回顧與展望
Hugging Face Blog768 days agoRelease
### Background In the current development of large language models (LLMs), high-quality alignment data (such as the preference data required for RLHF and DPO)…
Hugging Face 推出 RLOO 演算法：降低記憶體消耗，讓強化學習重回 RLHF 主流★ 80
Hugging Face Blog776 days agoRelease
In recent years, methods such as Direct Preference Optimization (DPO) have become mainstream for large language model (LLM) alignment, as they eliminate the…
資料眾包時代來臨：利用 Argilla 與 Hugging Face Spaces 共同打造更優質的社群數據集★ 75
Hugging Face Blog876 days agoNew Tool
### Background and Challenge: The High-Quality Data Bottleneck In the current development of generative AI and large language models (LLMs), the industry…
使用直接偏好最佳化 (DPO) 方法對 LLM 進行偏好微調 (Preference Tuning)★ 80
Hugging Face Blog922 days agoTutorial
This technical blog post from Hugging Face takes an in-depth look at the latest techniques in "preference tuning," with a particular focus on **Direct…
深入剖析：使用 PPO 進行 RLHF 的 N 個關鍵實作細節★ 85
Hugging Face Blog1,008 days agoTutorial
This technical blog post from Hugging Face takes an in-depth look at the critical "implementation details" that are routinely glossed over in academic papers…
使用 DPO 微調 Llama 2：Hugging Face TRL 實作指南★ 80
Hugging Face Blog1,085 days agoTutorial
### Background and Pain Points Traditional RLHF (Reinforcement Learning from Human Feedback), while achieving enormous success with models like ChatGPT…
基座模型能像人類一樣標記數據嗎？Hugging Face 探討 AI 標記與 RLHF 的可行性★ 75
Hugging Face Blog1,142 days agoCommentary
In the development of large language models (LLMs), RLHF (Reinforcement Learning from Human Feedback) is the critical step for aligning models with human…
StackLLaMA：使用 RLHF 微調 LLaMA 模型的實戰指南★ 80
Hugging Face Blog1,210 days agoTutorial
This classic blog post from Hugging Face provides an extremely valuable hands-on guide for the open-source community, detailing how to fine-tune the LLaMA…
在 24GB 消費級 GPU 上使用 RLHF 微調 20B 大型語言模型★ 85
Hugging Face Blog1,237 days agoRelease
This technical blog post from Hugging Face introduces how to combine TRL (Transformer Reinforcement Learning) and PEFT (Parameter-Efficient Fine-Tuning)…
什麼讓對話代理（Dialog Agent）變得實用？Hugging Face 深度解析★ 75
Hugging Face Blog1,281 days agoOpinion
Amid the generative AI wave sparked by ChatGPT, Hugging Face published this in-depth article exploring how to transform "base language models" — which can only…
圖解人類回饋強化學習 (RLHF)：ChatGPT 背後的關鍵對齊技術★ 85
Hugging Face Blog1,327 days agoTutorial
The release of ChatGPT in late 2022 triggered an explosion in generative AI, and the most critical technology behind it is Reinforcement Learning from Human…
深入淺出近端策略優化 (PPO)：Hugging Face 深度強化學習教程★ 70
Hugging Face Blog1,453 days agoTutorial
Proximal Policy Optimization (PPO) is a deep reinforcement learning (DRL) algorithm proposed by OpenAI in 2017. Due to its ease of implementation, training…

Page 1Next →

Latest in AI

Frontier Post-Training Recipe Review with Finbarr Timbers

vLLM V0 到 V1 的演進：在強化學習（RL）中「正確性重於修正」的實踐★ 75

解讀當前開源與閉源 AI 模型的性能差距：超越單一評估指標的迷思★ 75

Ecom-RLVE：為電商對話 Agent 打造的自適應可驗證強化學習環境★ 75

Nathan Lambert 的最新進展：ATOM Report、Post-Training 課程、新書與持續進行的 AI 研究★ 70

Hugging Face 發表 TRL v1.0：專為後訓練（Post-Training）打造的開源庫，邁向 API 穩定與高效對齊新里程碑★ 85

損耗性自我提升：為什麼 AI 自我改進是真的，但不會導致「急遽暴漲」★ 75

讓 Token 持續流動：來自 16 個開源強化學習（RL）函式庫的啟示★ 85

使用 RapidFire AI 讓 Hugging Face TRL 微調速度提升 20 倍★ 80

重新思考 Agent 的泛化能力：MiniMax M2 探討「我們究竟在對齊什麼？」★ 75

讓 GPU 毫無閒置：利用 TRL 中協同部署的 vLLM 解鎖高效能強化學習訓練★ 85

🐯 Liger GRPO 攜手 TRL：大幅降低 DeepSeek-R1 式強化學習訓練顯存與加速★ 82

Hugging Face 發布 Open R1 第四次更新：開源推理模型訓練的最新進展與最佳化★ 85

Open R1 第三次更新：Hugging Face 釋出開源推理模型與 GRPO 訓練優化細節★ 85

Open R1 更新第二彈：Hugging Face 複製 DeepSeek-R1 的最新進展與強化學習實踐★ 85

Open-R1：Hugging Face 推出完全開源的 DeepSeek-R1 重現計劃★ 90

Hugging Face 社群推出用於文字生成圖像的開源偏好資料集 (Open Preference Dataset)★ 75

Argilla 2.4 發布：在 Hugging Face Hub 上免程式碼輕鬆構建微調與評估數據集★ 75

Hugging Face「Data Is Better Together」社群數據協作計劃：回顧與展望

Hugging Face 推出 RLOO 演算法：降低記憶體消耗，讓強化學習重回 RLHF 主流★ 80

資料眾包時代來臨：利用 Argilla 與 Hugging Face Spaces 共同打造更優質的社群數據集★ 75

使用直接偏好最佳化 (DPO) 方法對 LLM 進行偏好微調 (Preference Tuning)★ 80

深入剖析：使用 PPO 進行 RLHF 的 N 個關鍵實作細節★ 85

使用 DPO 微調 Llama 2：Hugging Face TRL 實作指南★ 80

基座模型能像人類一樣標記數據嗎？Hugging Face 探討 AI 標記與 RLHF 的可行性★ 75

StackLLaMA：使用 RLHF 微調 LLaMA 模型的實戰指南★ 80

在 24GB 消費級 GPU 上使用 RLHF 微調 20B 大型語言模型★ 85

什麼讓對話代理（Dialog Agent）變得實用？Hugging Face 深度解析★ 75

圖解人類回饋強化學習 (RLHF)：ChatGPT 背後的關鍵對齊技術★ 85

深入淺出近端策略優化 (PPO)：Hugging Face 深度強化學習教程★ 70