How to Train LLMs to "Think" (o1 & DeepSeek-R1)

26.2K views· 912 likes· 33:18· Feb 17, 2025

ShareTwitter Facebook LinkedIn Instagram

🛍️ Products Mentioned (5)

🤝 Want your team maximizing Claude? I run 1:1 and team AI workshops for companies doing $1M+ per year: https://aibuilder.academy/yt/RveLjcNl0ds Here, I discuss the technical details behind the recent “advanced reasoning” models trained on large-scale reinforcement learning i.e. o1 and DeepSeek-R1. 📰 Read more: https://shawhin.medium.com/how-to-train-llms-to-think-like-o1-deepseek-r1-eabc21c8842d?source=friends_link&sk=ec3e7ca77cd47f76ce38015c87ba5084 References [1] https://openai.com/index/learning-to-reason-with-llms/ [2] arXiv:2501.12948 [cs.CL] [3] https://youtu.be/7xTGNNLPyMI [4] https://huggingface.co/datasets/open-r1/OpenR1-Math-220k [5] https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf Intro - 0:00 OpenAI's o1 - 0:33 Test-time Compute - 1:33 "Thinking" Tokens - 3:50 DeepSeek Paper - 5:58 Reinforcement Learning - 7:22 R1-Zero: Prompt Template - 9:28 R1-Zero: Reward - 10:53 R1-Zero: GRPO (technical) - 12:53 R1-Zero: Results - 20:00 DeepSeek R1 - 23:32 Step 1: SFT with CoT - 24:47 Step 2: R1-Zero Style RL - 26:14 Step 3: SFT with Mixed Data - 27:03 Step 4: RL & RLHF - 28:26 Accessing DeepSeek Models - 29:18 Conclusions - 30:10

Watch on YouTube