Latent SpaceJun 4, 2026, 8:39 PM

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

Latent Space interviews Andon Labs on VendingBench and building durable frontier AI evals.

Latent Space talks with Lukas Petersson and Axel Backlund of Andon Labs, the authors behind VendingBench. The episode focuses on evaluating Claude models across a range from Haiku to Mythos. It also discusses how they build frontier evals from scratch, with an emphasis on creating benchmarks that remain useful and meaningful over time.

想看英文原文 / 完整內容?

前往 Latent Space 原文 →

摘要由 AI 整理,以原文為準。