Hugging Face BlogJun 23, 2023, 12:00 AMimportant 75

關於 Open LLM 排行榜，到底發生了什麼事？評測分數差異深度解析

Original: What's going on with the Open LLM Leaderboard?

### Background: The Gap Between Leaderboard Scores and Paper Results By mid-2023, Hugging Face's Open LLM Leaderboard had become the…

本文探討 Hugging Face Open LLM 排行榜上模型分數（特別是 MMLU）與官方論文宣稱不一致的原因。Hugging Face 指出，評測對 Prompt 格式、Few-shot 設定及 Token 機率計算方式極為敏感。為了確保公平與可重複性，排行榜統一採用 EleutherAI 的 lm-evaluation-harness，呼籲社群建立標準化評測規範。

### Background: The Gap Between Leaderboard Scores and Paper Results

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source #evaluation #mmlu #leaderboard #benchmarking

Summaries are AI-generated; the original article is authoritative.