Latest in AI

Showing:trlResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL
Hugging Face Blog62 days agoTutorial
Based on the title, this Hugging Face Blog post focuses on Delta Weight Sync in TRL. It likely discusses moving or synchronizing weight differences at very large model scale using a Hub bucket-related workflow. Without the full article, implementation details, benchmarks, APIs, and stability claims cannot be confirmed.
使用 RapidFire AI 讓 Hugging Face TRL 微調速度提升 20 倍★ 80
Hugging Face Blog249 days agoRelease
The Hugging Face official blog has announced a collaboration with RapidFire AI, bringing a revolutionary performance improvement to its popular TRL…
讓 GPU 毫無閒置：利用 TRL 中協同部署的 vLLM 解鎖高效能強化學習訓練★ 85
Hugging Face Blog420 days agoRelease
In the reinforcement learning from human feedback (RLHF) training process for large language models — whether PPO or the recently popular GRPO — there are…
Hugging Face 推出 RLOO 演算法：降低記憶體消耗，讓強化學習重回 RLHF 主流★ 80
Hugging Face Blog776 days agoRelease
In recent years, methods such as Direct Preference Optimization (DPO) have become mainstream for large language model (LLM) alignment, as they eliminate the…
使用直接偏好最佳化 (DPO) 方法對 LLM 進行偏好微調 (Preference Tuning)★ 80
Hugging Face Blog922 days agoTutorial
This technical blog post from Hugging Face takes an in-depth look at the latest techniques in "preference tuning," with a particular focus on **Direct…