Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
Latent Space·4h ago·Benchmark
Latent Space talks with Lukas Petersson and Axel Backlund of Andon Labs, the authors behind VendingBench. The episode focuses on evaluating Claude models across a range from Haiku to Mythos. It also discusses how they build frontier evals from scratch, with an emphasis on creating benchmarks that remain useful and meaningful over time.