Frontier Post-Training Recipe Review with Finbarr Timbers

Original: Frontier post-training recipe review with Finbarr Timbers

Nathan Lambert interviews Finbarr Timbers on how frontier labs approach post-training for large language models.

In the 18th installment of his interview series, Interconnects author Nathan Lambert speaks with Finbarr Timbers about the post-training techniques used at frontier AI labs. The conversation examines the methodologies — including supervised fine-tuning, reinforcement learning from human feedback, and preference optimization — that shape model behavior after pretraining. The discussion offers a practitioner's perspective on the evolving landscape of alignment and capability tuning at scale.

In episode 18 of his ongoing interview series, Nathan Lambert — author of the Interconnects newsletter and a prominent voice in AI alignment and post-training research — sits down with Finbarr Timbers to review how frontier AI laboratories structure their post-training pipelines. Post-training refers to the suite of techniques applied to a base language model after the initial large-scale pretraining phase; it encompasses supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and newer variants that have proliferated across labs in recent years.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.