Latest in AI

Showing:leaderboardResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

GLM-5.2 Claims Top Open-Weights Spot on Artificial Analysis Intelligence Index
Hacker News (AI keywords)41 days agoBenchmark
GLM-5.2, the latest open-weights model from Zhipu AI, has claimed the top position on the Artificial Analysis Intelligence Index among all openly available models. This marks a notable shift in the open-weights leaderboard, which tracks quality, speed, and price across dozens of frontier and community models. The result signals continued momentum from Chinese AI labs producing competitive open-weights alternatives to proprietary frontier systems.
Zhipu's Open-Source GLM-5.2 Claims Top AI Coding Rank, Second Only to Fable-5
量子位 QbitAI41 days agoBenchmark
Zhipu AI has released GLM-5.2, an open-source large language model that has claimed the top position in AI coding benchmarks among all models except Anthropic's Fable-5. The result marks a significant milestone for the open-source community, showing that the gap between proprietary frontier models and open-source alternatives in code generation continues to shrink. For developers seeking capable, self-hostable coding models, GLM-5.2 now represents the strongest open-source option available.
GLM 5.2 Performance Benchmarks — Artificial Analysis
Hacker News (AI keywords)41 days agoBenchmark
Artificial Analysis, an independent AI evaluation platform, has released benchmark results for Zhipu AI's GLM 5.2 language model. The evaluation covers the standard Artificial Analysis methodology, which typically assesses output quality, inference speed, and price-per-token. GLM 5.2 represents the latest iteration of Zhipu AI's flagship model series, positioning it against leading frontier models on a common scoring framework.
Hugging Face 與 IBM 聯合推出 Open Agent Leaderboard：開源 AI 智能體效能評測全新基準★ 80
Hugging Face Blog71 days agoRelease
Hugging Face and IBM Research have jointly announced the launch of the "Open Agent Leaderboard," aimed at establishing an objective, standardized, and fully…
QIMMA ⛰：首個品質優先的阿拉伯語大型語言模型（LLM）排行榜
Hugging Face Blog98 days agoRelease
The Technology Innovation Institute (TII) of the United Arab Emirates — the organization behind the well-known open-source model Falcon — has officially…
Hugging Face 推出 Open ASR Leaderboard 新賽道：聚焦多語言與長音訊語音辨識趨勢★ 75
Hugging Face Blog249 days agoRelease
Hugging Face recently made a major upgrade to its flagship "Open ASR Leaderboard," officially launching two brand-new evaluation tracks: "Multilingual" and…
Hugging Face 推出阿拉伯語 LLM 評估新標準：引入阿拉伯語指令遵循（IFEval）與更新 AraGen
Hugging Face Blog476 days agoRelease
Hugging Face recently announced a major upgrade to its Arabic Large Language Model (LLM) leaderboard, aiming to provide a more credible and comprehensive…
Hugging Face 推出 Math-Verify：修正 Open LLM Leaderboard 的數學評測偏差★ 78
Hugging Face Blog529 days agoNew Tool
Hugging Face's Open LLM Leaderboard has long served as an important barometer for measuring the capabilities of open-source large language models (LLMs)…
Hugging Face 推出第二代開源阿拉伯語大語言模型排行榜 (Open Arabic LLM Leaderboard 2)
Hugging Face Blog533 days agoRelease
Hugging Face, in collaboration with its partners, has officially launched the "Open Arabic LLM Leaderboard 2.0." With the explosive growth of Arabic large…
重新思考阿拉伯語大模型評估：AraGen 基準測試與 3C3H 評估框架上線 Hugging Face
Hugging Face Blog601 days agoRelease
### Background and Challenges: The Difficulty of Evaluating Non-English LLMs In the current landscape of large language model (LLM) development, evaluating…
Hugging Face 推出全新「開放式日語 LLM 排行榜」，加速日語大語言模型評測★ 75
Hugging Face Blog615 days agoNew Tool
Hugging Face has officially launched the "Open Japanese LLM Leaderboard," a community-driven platform dedicated to evaluating the performance of…
Hugging Face 推出 Open FinLLM 排行榜：專為金融領域大語言模型打造的開源評測基準★ 75
Hugging Face Blog662 days agoRelease
Hugging Face has officially launched the "Open FinLLM Leaderboard" — a new platform dedicated to evaluating and tracking the performance of large language…
Hugging Face 聯合 Artificial Analysis 推出「文字生成圖片」排行榜與競技場★ 75
Hugging Face Blog782 days agoNew Tool
Hugging Face has partnered with independent AI evaluation organization Artificial Analysis to officially launch the "Text to Image Leaderboard & Arena." This…
Hugging Face 推出 Open Arabic LLM 排行榜，加速阿拉伯語大語言模型評測與發展
Hugging Face Blog805 days agoRelease
Hugging Face has announced the launch of the "Open Arabic LLM Leaderboard," an important initiative aimed at advancing Arabic natural language processing (NLP)…
Hugging Face 推出希伯來語 LLM 開放排行榜，推動非英語系 AI 模型評測
Hugging Face Blog814 days agoRelease
Hugging Face has officially launched the "Open Leaderboard for Hebrew LLMs," an open-source evaluation platform specifically designed for Hebrew large language…
Hugging Face 聯手 Artificial Analysis 推出 LLM 效能與成本排行榜★ 75
Hugging Face Blog816 days agoNew Tool
Hugging Face has announced a partnership with the independent AI performance analytics firm Artificial Analysis, officially integrating its "LLM Performance…
Hugging Face 推出 Open Chain of Thought (CoT) 排行榜：專注評估開源模型的推理與思考鏈能力★ 75
Hugging Face Blog826 days agoRelease
Hugging Face has announced the launch of the new "Open Chain of Thought (CoT) Leaderboard," a public platform specifically designed to evaluate and compare the…
Hugging Face 推出 Open Medical-LLM 排行榜：標準化評估醫療保健領域的大型語言模型★ 75
Hugging Face Blog830 days agoRelease
Hugging Face has announced the official launch of the "Open Medical-LLM Leaderboard" in collaboration with researchers from Open Life Science AI and the…
Hugging Face 與 Upstage 推出 Open Ko-LLM 排行榜：引領韓國大語言模型評估生態系
Hugging Face Blog889 days agoRelease
Hugging Face and South Korea's leading AI startup Upstage have jointly announced the launch of the "Open Ko-LLM Leaderboard." This is a brand-new evaluation…
Hugging Face 推出「幻覺排行榜」，開源量化評估大型語言模型的幻覺率★ 75
Hugging Face Blog911 days agoRelease
While large language models (LLMs) have demonstrated remarkable generative capabilities across many domains, "hallucination" — where a model confidently…
Hugging Face 推出 AI Secure LLM 安全排行榜：基於 DecodingTrust 框架深度評估大模型信任度★ 75
Hugging Face Blog914 days agoRelease
### Introduction: Capability Is Not Safety — A New Benchmark for LLM Safety Evaluation As large language models (LLMs) are adopted more deeply across…
如何建立自己的 Hugging Face 排行榜：以 Vectara 幻覺排行榜為例的完整指南★ 75
Hugging Face Blog928 days agoTutorial
In the open-source AI community, the Hugging Face Open LLM Leaderboard serves as an important benchmark for evaluating model capabilities. However, many…
Open LLM Leaderboard：深入解析 DROP 基準測試與模型「刷榜」現象★ 75
Hugging Face Blog970 days agoCommentary
The Hugging Face Open LLM Leaderboard has long served as an important benchmark for the community to evaluate the capabilities of open-source models. However…
Hugging Face 推出全新「物件偵測排行榜」(Object Detection Leaderboard)
Hugging Face Blog1,044 days agoNew Tool
Hugging Face has officially launched the "Object Detection Leaderboard," a brand-new evaluation platform designed for the computer vision field. With the rapid…
關於 Open LLM 排行榜，到底發生了什麼事？評測分數差異深度解析★ 75
Hugging Face Blog1,131 days agoCommentary
### Background: The Gap Between Leaderboard Scores and Paper Results By mid-2023, Hugging Face's Open LLM Leaderboard had become the community's go-to platform…

Latest in AI

GLM-5.2 Claims Top Open-Weights Spot on Artificial Analysis Intelligence Index

Zhipu's Open-Source GLM-5.2 Claims Top AI Coding Rank, Second Only to Fable-5

GLM 5.2 Performance Benchmarks — Artificial Analysis

Hugging Face 與 IBM 聯合推出 Open Agent Leaderboard：開源 AI 智能體效能評測全新基準★ 80

QIMMA ⛰：首個品質優先的阿拉伯語大型語言模型（LLM）排行榜

Hugging Face 推出 Open ASR Leaderboard 新賽道：聚焦多語言與長音訊語音辨識趨勢★ 75

Hugging Face 推出阿拉伯語 LLM 評估新標準：引入阿拉伯語指令遵循（IFEval）與更新 AraGen

Hugging Face 推出 Math-Verify：修正 Open LLM Leaderboard 的數學評測偏差★ 78

Hugging Face 推出第二代開源阿拉伯語大語言模型排行榜 (Open Arabic LLM Leaderboard 2)

重新思考阿拉伯語大模型評估：AraGen 基準測試與 3C3H 評估框架上線 Hugging Face

Hugging Face 推出全新「開放式日語 LLM 排行榜」，加速日語大語言模型評測★ 75

Hugging Face 推出 Open FinLLM 排行榜：專為金融領域大語言模型打造的開源評測基準★ 75

Hugging Face 聯合 Artificial Analysis 推出「文字生成圖片」排行榜與競技場★ 75

Hugging Face 推出 Open Arabic LLM 排行榜，加速阿拉伯語大語言模型評測與發展

Hugging Face 推出希伯來語 LLM 開放排行榜，推動非英語系 AI 模型評測

Hugging Face 聯手 Artificial Analysis 推出 LLM 效能與成本排行榜★ 75

Hugging Face 推出 Open Chain of Thought (CoT) 排行榜：專注評估開源模型的推理與思考鏈能力★ 75

Hugging Face 推出 Open Medical-LLM 排行榜：標準化評估醫療保健領域的大型語言模型★ 75

Hugging Face 與 Upstage 推出 Open Ko-LLM 排行榜：引領韓國大語言模型評估生態系

Hugging Face 推出「幻覺排行榜」，開源量化評估大型語言模型的幻覺率★ 75

Hugging Face 推出 AI Secure LLM 安全排行榜：基於 DecodingTrust 框架深度評估大模型信任度★ 75

如何建立自己的 Hugging Face 排行榜：以 Vectara 幻覺排行榜為例的完整指南★ 75

Open LLM Leaderboard：深入解析 DROP 基準測試與模型「刷榜」現象★ 75

Hugging Face 推出全新「物件偵測排行榜」(Object Detection Leaderboard)

關於 Open LLM 排行榜，到底發生了什麼事？評測分數差異深度解析★ 75