今日 AI 怎麼做問 AI Pricing

Log in Subscribe free

今日 AI 怎麼做問 AI Pricing Log in

Latest in AI

Showing:pythonGeminiClear ×

🔥 Trending today

microsoft-build6 agentic-ai4 ai-agents4 enterprise-ai4 nvidia4 ai-assistant3 python3 webassembly3 sandboxing3 cybersecurity3

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

BigCodeBench: The Next Generation of HumanEval★ 80
Hugging Face Blog715d agoRelease
傳統的 HumanEval 程式碼評測基準已逐漸飽和且過於簡單。Hugging Face 與研究團隊合作推出新一代基準 BigCodeBench，包含 1,140 個實用編程任務，涵蓋 139 個第三方 Python 函式庫。此基準旨在考驗 LLM 在複雜、多步驟及真實開發場景下的程式碼生成與指令遵循能力，成為評估 Code LLM 的新一代標準。