Hugging Face BlogOct 7, 2025, 9:37 AMimportant 75

Hugging Face 推出 BigCodeArena：透過實際執行程式碼進行端到端 Code LLM 評測

Original: BigCodeArena: Judging code generations end to end with code executions

Hugging Face and the BigCode community have jointly launched a new code model evaluation platform called "BigCodeArena." As AI-assisted…

Hugging Face 與 BigCode 合作推出全新評測平台「BigCodeArena」。該平台主打「端到端實際執行（Execution-based）」評測機制，將模型生成的程式碼置於安全沙盒中運行並進行單元測試。這解決了傳統「LLM 當裁判」或靜態分析無法驗證程式碼真實可用性的痛點，為開發者與研究人員提供更具公信力的 Code LLM 排行榜。

Hugging Face and the BigCode community have jointly launched a new code model evaluation platform called "BigCodeArena." As AI-assisted coding (such as Copilot and Cursor) becomes part of developers' daily workflows, accurately evaluating the code generation quality of various large language models (LLMs) has become an important challenge in the AI field.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source other #coding #benchmark #evaluation #open-source

Summaries are AI-generated; the original article is authoritative.