Frontier Post-Training Recipe Review with Finbarr Timbers
Original: Frontier post-training recipe review with Finbarr Timbers
Nathan Lambert interviews Finbarr Timbers on how frontier labs approach post-training for large language models.
In the 18th installment of his interview series, Interconnects author Nathan Lambert speaks with Finbarr Timbers about the post-training techniques used at frontier AI labs. The conversation examines the methodologies — including supervised fine-tuning, reinforcement learning from human feedback, and preference optimization — that shape model behavior after pretraining. The discussion offers a practitioner's perspective on the evolving landscape of alignment and capability tuning at scale.
In episode 18 of his ongoing interview series, Nathan Lambert — author of the Interconnects newsletter and a prominent voice in AI alignment and post-training research — sits down with Finbarr Timbers to review how frontier AI laboratories structure their post-training pipelines. Post-training refers to the suite of techniques applied to a base language model after the initial large-scale pretraining phase; it encompasses supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and newer variants that have proliferated across labs in recent years.
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on Interconnects (Nathan L.) →Summaries are AI-generated; the original article is authoritative.