How to Evaluate (and Improve) Your LLM Apps

10.9K views· 244 likes· 27:19· Mar 17, 2025

ShareTwitter Facebook LinkedIn Instagram

🛍️ Products Mentioned (4)

🤝 Want your team maximizing Claude? I run 1:1 and team AI workshops for companies doing $1M+ per year: https://aibuilder.academy/yt/-sL7QzDFW-4 Here, I discuss 3 types of evals and how to use them to improve LLM apps. 📰 Blog: https://medium.com/@shawhin/how-to-evaluate-and-improve-your-llm-apps-f7b08fb7493c?sk=f2fbcd3f16b958baa4734d4a39d5b237 💻 Example Code: https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/evals References [1] https://youtu.be/XGJNo8TpuVA [2] arXiv:2501.12948 [cs.CL] [3] arXiv:2402.01383 [cs.CL] [4] https://hamel.dev/blog/posts/llm-judge/ [5] arXiv:2203.02155 [cs.CL] [6] https://youtu.be/SnbGD677_u0 -- Intro - 0:00 Vibe Checks - 0:27 Evals - 3:26 Type 1: Code-based - 5:58 Type 2: Human-based - 9:34 Type 3: LLM-based - 13:34 Example: Improving y2b with LLM Judge - 15:28

Watch on YouTube