Latest in AI

Showing:BenchmarkOpen-sourceClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Hugging Face 推出 Open Medical-LLM 排行榜：標準化評估醫療保健領域的大型語言模型★ 75
Hugging Face Blog830 days agoRelease
Hugging Face has announced the official launch of the "Open Medical-LLM Leaderboard" in collaboration with researchers from Open Life Science AI and the…
推出 LiveCodeBench 排行榜：全面且無污染的程式碼大語言模型評估★ 75
Hugging Face Blog833 days agoRelease
As code large language models (Code LLMs) develop rapidly, fairly and accurately evaluating their capabilities has become a major challenge. Traditional…
Hugging Face 推出 ConTextual 排行榜：評估多模態模型在富含文本場景中的圖文聯合推理能力★ 75
Hugging Face Blog875 days agoRelease
Hugging Face has announced the launch of a new multimodal benchmark and leaderboard called "ConTextual," aimed at addressing the shortcomings of existing…
Hugging Face 推出 NPHardEval 排行榜：透過計算複雜度與動態更新揭示大型語言模型的推理能力★ 75
Hugging Face Blog907 days agoRelease
Hugging Face has announced the launch of the new **NPHardEval** leaderboard — a benchmark specifically designed to evaluate the reasoning capabilities of large…
Hugging Face 推出「企業情境排行榜」：專為真實世界應用設計的 LLM 評測基準★ 75
Hugging Face Blog909 days agoRelease
Hugging Face has partnered with Patronus AI — a startup focused on LLM evaluation and defense — to officially launch the **Enterprise Scenarios Leaderboard**…
訓練與推論速度大對決：Habana Gaudi®2 效能超越 Nvidia A100 80GB
Hugging Face Blog1,322 days agoCommentary
As large language models (LLMs) and generative AI exploded in popularity, demand for computing power surged dramatically, leaving Nvidia GPUs (such as the…
超大型語言模型及其評估方法：Hugging Face 推出 Hub 上的零樣本評估★ 75
Hugging Face Blog1,394 days agoNew Tool
In late 2022, as massive language models like BLOOM and OPT emerged one after another, the AI community faced a core pain point: how to effectively and…

← PreviousPage 2