ITBench-AA: Frontier Models Score Below 50% on Enterprise IT Tasks★ 72
Hugging Face Blog·6d ago·Benchmark
Artificial Analysis and IBM present ITBench-AA, described in the title as the first benchmark for agentic enterprise IT tasks. The headline result is that frontier models score below 50%, suggesting current systems still struggle with enterprise-grade agent workflows. The original article text is unavailable here, so task design, evaluated models, scoring methodology, and rankings cannot be confirmed.