Latest in AI

Showing:dpoResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Direct Preference Optimization Beyond Chatbots
Hugging Face Blog55 days agoTutorial
Based only on the title, this Hugging Face Blog post appears to discuss Direct Preference Optimization outside conventional chatbot use cases. It may frame DPO as a broader preference-alignment method for model outputs, workflows, or non-conversational AI systems. Without the full article, specific claims about experiments, datasets, models, or implementation details cannot be verified.
Hugging Face 發表 TRL v1.0：專為後訓練（Post-Training）打造的開源庫，邁向 API 穩定與高效對齊新里程碑★ 85
Hugging Face Blog119 days agoRelease
Hugging Face has officially announced the release of TRL (Transformer Reinforcement Learning) v1.0. This is a major milestone, marking TRL's transformation…
使用 RapidFire AI 讓 Hugging Face TRL 微調速度提升 20 倍★ 80
Hugging Face Blog249 days agoRelease
The Hugging Face official blog has announced a collaboration with RapidFire AI, bringing a revolutionary performance improvement to its popular TRL…
Hugging Face TRL 支援視覺語言模型 (VLM) 對齊：輕鬆實現多模態 DPO 與 ORPO 訓練★ 80
Hugging Face Blog355 days agoRelease
Hugging Face's TRL (Transformer Reinforcement Learning) is a popular open-source library specifically designed for aligning language models (LLMs). In its…
Hugging Face 社群推出用於文字生成圖像的開源偏好資料集 (Open Preference Dataset)★ 75
Hugging Face Blog596 days agoRelease
### Introduction: An Important Piece of the Open-Source Image Generation Puzzle As text-to-image (T2I) technology advances rapidly, ensuring that AI-generated…
視覺語言模型（VLM）的偏好最佳化指南：使用 TRL 進行 DPO 微調★ 75
Hugging Face Blog748 days agoTutorial
As vision-language models (VLMs) are increasingly applied to multimodal tasks, how to make these models produce outputs that better align with human…
使用開源 LLM 實作憲政 AI (Constitutional AI)：Hugging Face 的對齊新指南★ 78
Hugging Face Blog908 days agoTutorial
This blog post from Hugging Face provides an in-depth exploration of how to implement "Constitutional AI (CAI)" using open-source large language models (Open…
使用直接偏好最佳化 (DPO) 方法對 LLM 進行偏好微調 (Preference Tuning)★ 80
Hugging Face Blog922 days agoTutorial
This technical blog post from Hugging Face takes an in-depth look at the latest techniques in "preference tuning," with a particular focus on **Direct…
2023 年：開源大語言模型（Open LLMs）爆發之年★ 75
Hugging Face Blog953 days agoCommentary
Looking back on 2023, the most notable trend in the AI landscape was the explosive growth of open-source large language models (Open LLMs). In this annual…
使用 DPO 微調 Llama 2：Hugging Face TRL 實作指南★ 80
Hugging Face Blog1,085 days agoTutorial
### Background and Pain Points Traditional RLHF (Reinforcement Learning from Human Feedback), while achieving enormous success with models like ChatGPT…