Latest in AI

Showing:BenchmarkGeneralOpen-sourceClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)
r/LocalLLaMA top day49 days agoBenchmark
Reddit user UkieTechie has revamped their TTS benchmark platform with objective scoring standards and live blind voting, now covering 46 speech synthesis models. Hosted on Hugging Face Space, the arena lets users vote on audio quality without knowing the model name, generating a dynamic ELO leaderboard. The project is open-source on GitHub and welcomes community submissions of new models.
QIMMA ⛰：首個品質優先的阿拉伯語大型語言模型（LLM）排行榜
Hugging Face Blog98 days agoRelease
The Technology Innovation Institute (TII) of the United Arab Emirates — the organization behind the well-known open-source model Falcon — has officially…
IBM 與柏克萊加州大學推出 IT-Bench 與 MAST：診斷企業級 AI Agent 失敗原因的全新基準與框架★ 80
Hugging Face Blog159 days agoRelease
### The Pain Points of Enterprise AI Agents in Production: Why Do They Keep Failing? As large language models (LLMs) have rapidly advanced, enterprises have…
AssetOpsBench：彌合 AI Agent 評估基準與工業實際應用差距的全新基準測試★ 75
Hugging Face Blog188 days agoRelease
In today's era of rapid development in AI Agent technology, how to evaluate the performance of these Agents in real-world settings — particularly in industrial…
Google DeepMind 推出 FACTS 基準測試套件：系統化評估大型語言模型的真實性★ 80
Google DeepMind Blog231 days agoRelease
As large language models (LLMs) are deployed across a wide range of industries, ensuring the "factuality" of model outputs and reducing "hallucination" has…
介紹 HELMET：全面評估長文本語言模型（Long-context LLMs）的新一代基準測試★ 80
Hugging Face Blog468 days agoRelease
### Background and Pain Points: Moving Beyond the Overly Simple "Needle in a Haystack" Test In recent years, the context window length supported by large…
讓大型模型展開辯論：首屆多語言 LLM 辯論賽★ 75
Hugging Face Blog615 days agoRelease
This article from the Hugging Face blog introduces "The First Multilingual LLM Debate Competition." As large language models (LLMs) have rapidly advanced…
Hugging Face 推出 Open FinLLM 排行榜：專為金融領域大語言模型打造的開源評測基準★ 75
Hugging Face Blog662 days agoRelease
Hugging Face has officially launched the "Open FinLLM Leaderboard" — a new platform dedicated to evaluating and tracking the performance of large language…
Hugging Face 推出「企業情境排行榜」：專為真實世界應用設計的 LLM 評測基準★ 75
Hugging Face Blog909 days agoRelease
Hugging Face has partnered with Patronus AI — a startup focused on LLM evaluation and defense — to officially launch the **Enterprise Scenarios Leaderboard**…