🤝 Want your team maximizing Claude? I run 1:1 and team AI workshops for companies doing $1M+ per year: https://aibuilder.academy/yt/RveLjcNl0ds Here, I discuss the technical details behind the recent “advanced reasoning” models trained on large-scale reinforcement learning i.e. o1 and DeepSeek-R1. 📰 Read more: https://shawhin.medium.com/how-to-train-llms-to-think-like-o1-deepseek-r1-eabc21c8842d?source=friends_link&sk=ec3e7ca77cd47f76ce38015c87ba5084 References [1] https://openai.com/index/learning-to-reason-with-llms/ [2] arXiv:2501.12948 [cs.CL] [3] https://youtu.be/7xTGNNLPyMI [4] https://huggingface.co/datasets/open-r1/OpenR1-Math-220k [5] https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf Intro - 0:00 OpenAI's o1 - 0:33 Test-time Compute - 1:33 "Thinking" Tokens - 3:50 DeepSeek Paper - 5:58 Reinforcement Learning - 7:22 R1-Zero: Prompt Template - 9:28 R1-Zero: Reward - 10:53 R1-Zero: GRPO (technical) - 12:53 R1-Zero: Results - 20:00 DeepSeek R1 - 23:32 Step 1: SFT with CoT - 24:47 Step 2: R1-Zero Style RL - 26:14 Step 3: SFT with Mixed Data - 27:03 Step 4: RL & RLHF - 28:26 Accessing DeepSeek Models - 29:18 Conclusions - 30:10

The 8 Claude Skills Running My Business
1.2K views

How to Use Claude Better than 99% of Founder-CEOs
798 views

Claude Cowork Explained in 29 Minutes (for non-coders)
1.7K views

How I Taught Claude To Edit My YouTube Videos
4.5K views

How to Automate Anything with Claude (4-Step Framework)
4.4K views

Claude Code for SWE Teams: Building a Shared AI Coding Toolkit
1.9K views