π€ Want your team maximizing Claude? I run 1:1 and team AI workshops for companies doing $1M+ per year: https://aibuilder.academy/yt/-sL7QzDFW-4 Here, I discuss 3 types of evals and how to use them to improve LLM apps. π° Blog: https://medium.com/@shawhin/how-to-evaluate-and-improve-your-llm-apps-f7b08fb7493c?sk=f2fbcd3f16b958baa4734d4a39d5b237 π» Example Code: https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/evals References [1] https://youtu.be/XGJNo8TpuVA [2] arXiv:2501.12948 [cs.CL] [3] arXiv:2402.01383 [cs.CL] [4] https://hamel.dev/blog/posts/llm-judge/ [5] arXiv:2203.02155 [cs.CL] [6] https://youtu.be/SnbGD677_u0 -- Intro - 0:00 Vibe Checks - 0:27 Evals - 3:26 Type 1: Code-based - 5:58 Type 2: Human-based - 9:34 Type 3: LLM-based - 13:34 Example: Improving y2b with LLM Judge - 15:28

The 8 Claude Skills Running My Business
1.2K views

How to Use Claude Better than 99% of Founder-CEOs
798 views

Claude Cowork Explained in 29 Minutes (for non-coders)
1.7K views

How I Taught Claude To Edit My YouTube Videos
4.5K views

How to Automate Anything with Claude (4-Step Framework)
4.4K views

Claude Code for SWE Teams: Building a Shared AI Coding Toolkit
1.9K views