Hugging Face BlogMay 27, 2026, 5:20 PM重要 72
ITBench-AA: Frontier Models Score Below 50% on Enterprise IT Tasks
Original: ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM
Artificial Analysis and IBM introduce ITBench-AA, where frontier models score below 50% on agentic enterprise IT tasks.
Artificial Analysis and IBM present ITBench-AA, described in the title as the first benchmark for agentic enterprise IT tasks. The headline result is that frontier models score below 50%, suggesting current systems still struggle with enterprise-grade agent workflows. The original article text is unavailable here, so task design, evaluated models, scoring methodology, and rankings cannot be confirmed.
想看英文原文 / 完整內容?
前往 Hugging Face Blog 原文 →摘要由 AI 整理,以原文為準。