RAG - naive, advanced, multi-agent, MCP

622 views· 27 likes· 12:16· Sep 5, 2025

ShareTwitter Facebook LinkedIn Instagram

🛍️ Products Mentioned (5)

RAG - naive, advanced, multi-agent, MCP Presented by Lev Selector Slides - https://github.com/lselector/seminar/tree/master/2025 (click on pptx file, then on "raw" or download button on the right) ------------------------------------------------------- Please write in comments how would you implement RAG today. Can you recommend an open-source ready-to use solution? Maybe with MCP server? Which embedding models, Vector DBs, and LLM to use? ------------------------------------------------------- Need AI Consulting ? - Enterprise AI Solutions - https://EAIS.ai - Linkedin - https://www.linkedin.com/in/levselector - GitHub - https://github.com/lselector ------------------------------------------------------- Our books: "Artificial Intelligence In Business: Strategies That Work" https://www.amazon.com/dp/B0FKZ3364G/ "AI Product Manager: Building Tomorrow" https://www.amazon.com/dp/B0FMS9NFK9/ --------- Contents of today's video: Comprehensive overview of Retrieval Augmented Generation (RAG) systems, from basic concepts to advanced techniques. Here are the main takeaways: RAG combines retrieval of relevant information from external knowledge sources with language model generation. The basic flow involves converting local data and user queries into embeddings, searching a vector database for relevant chunks, and augmenting the LLM prompt with retrieved context. *Key Problem*: Questions and answers don't always need to be semantically similar, which can lead to poor retrieval performance. *Major Pain Points and Solutions* *Common RAG Failures*: - Bad retrieval (low precision/recall, irrelevant data, hallucination) - Poor response generation (hallucination, irrelevance, bias) - Difficulty handling structured/unstructured data - Complex question processing *Core Improvement Strategies*: - *Hybrid search*: Combining keyword matching with semantic vector search - *Re-ranking*: Using LLMs to score and prioritize retrieved chunks - *Multi-agent systems*: Separate agents for query understanding, retrieval, ranking, and response generation - *Question vector indexing*: Generate "exam questions" from text chunks and search in question space rather than text space *Advanced Techniques* *Context Engineering Philosophy*: Moving from "dump everything into context" to sophisticated context curation. The key insight is that tight, well-curated context (20-40 chunks) often outperforms stuffing entire context windows due to "context rot" - model performance degrading as context length increases. *Specific Advanced Methods*: - Small-to-big retrieval (embed sentences, return with surrounding context) - Deep Memory (neural network optimization of embedding space) - Fine-tuning embeddings and LLMs - Semantic caching and long-term memory - Query transformation and enrichment *Graph RAG and Alternative Approaches* *GraphRAG*: Microsoft's approach using LLMs to create knowledge graphs with Graph Neural Networks for both embedding and retrieval. Other graph database solutions include Neo4j, Amazon Neptune, and various specialized platforms. *RAG 2.0*: Instead of using separate frozen components, this approach pretrains and fine-tunes all components (embeddings, retrieval, generation) as an integrated system. *Practical Implementation* *Ready-to-Use Tools*: The document highlights several MCP (Model Context Protocol) servers and evaluation frameworks like RAGAS, making it easier to implement RAG without building from scratch. *Evaluation*: Emphasizes the critical importance of creating small "golden datasets" for quantitative measurement and continuous improvement. The overarching theme is that modern RAG is evolving from simple vector similarity search to sophisticated, multi-stage systems that treat context engineering as a precision discipline focused on measurable improvements rather than naive "throw everything at the model" approaches.

Watch on YouTube