Latest in AI

Showing:BenchmarkGeneralMistralClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

介紹 HELMET：全面評估長文本語言模型（Long-context LLMs）的新一代基準測試★ 80
Hugging Face Blog468 days agoRelease
### Background and Pain Points: Moving Beyond the Overly Simple "Needle in a Haystack" Test In recent years, the context window length supported by large…
Hugging Face 推出「企業情境排行榜」：專為真實世界應用設計的 LLM 評測基準★ 75
Hugging Face Blog909 days agoRelease
Hugging Face has partnered with Patronus AI — a startup focused on LLM evaluation and defense — to officially launch the **Enterprise Scenarios Leaderboard**…