OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

Published: Oct 2, 2024

Combining LLMs with AlphaGo-style deep reinforcement learning has been a holy grail for many leading AI labs, and with o1 (aka Strawberry) we are seeing the most general merging of the two modes to date. o1 is admittedly better at math than essay writing, but it has already achieved SOTA on a number of math, coding and reasoning benchmarks. Deep RL legend and now OpenAI researcher Noam Brown and teammates Ilge Akkaya and Hunter Lightman discuss the ah-ha moments on the way to the release of o1, how it uses chains of thought and backtracking to think through problems, the discovery of strong test-time compute scaling laws and what to expect as the model gets better. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital Mentioned in this episode: Learning to Reason with LLMs : Technical report accompanying the launch of OpenAI o1. Generator verifier gap : Concept Noam explains in terms of what kinds of problems benefit from more inference-time compute. Agent57: Outperforming the human Atari benchmark , 2020 paper where DeepMind demonstrated “the first deep reinforcement learning agent to obtain a score that is above the human baseline on all 57 Atari 2600 games.” Move 37 : Pivotal move in AlphaGo’s second game against Lee Sedol where it made a move so surprising that Sedol thought it must be a mistake, and only later discovered he had lost the game to a superhuman move. IOI competition : OpenAI entered o1 into the International Olympiad in Informatics and received a Silver Medal.

Training Data

Info

Published: Oct 2, 2024
Uploaded: Jun 11, 2026
Uploaded by: Nicholas
Queried: 0 times

Use with your agent

Have your agent query this content directly

Download package

Unlocks the raw transcripts and files to use as you please

1 UnlockBuy

Discover playbooks

Create a repeatable workflow using this source

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

Info

More