Hugging Face BlogMay 27, 2026, 5:20 PM重要 72

ITBench-AA: Frontier Models Score Below 50% on Enterprise IT Tasks

Original: ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Artificial Analysis and IBM introduce ITBench-AA, where frontier models score below 50% on agentic enterprise IT tasks.

Artificial Analysis and IBM present ITBench-AA, described in the title as the first benchmark for agentic enterprise IT tasks. The headline result is that frontier models score below 50%, suggesting current systems still struggle with enterprise-grade agent workflows. The original article text is unavailable here, so task design, evaluated models, scoring methodology, and rankings cannot be confirmed.

想看英文原文 / 完整內容?

前往 Hugging Face Blog 原文 →

摘要由 AI 整理,以原文為準。