Jerry Liu - Keynote: Building and Productionizing RAG | PyData Global 2023

Ai

Learn how to build and deploy production-ready RAG systems with Jerry Liu. Covers architecture, optimization, evaluation, and best practices for retrieval-augmented generation.

Key takeaways
  • RAG (Retrieval Augmented Generation) consists of two main components: retrieval and synthesis, where you enhance LLM capabilities by providing relevant context from external data sources

  • Common RAG challenges include:

    • Bad retrieval (low precision/recall)
    • Context window limitations
    • Loss of information in middle context
    • Hallucination issues
    • Response quality problems
  • Key optimization strategies:

    • Tune chunk sizes (balance between context and relevance)
    • Use metadata filtering
    • Implement hierarchical summarization
    • Apply re-ranking and fusion techniques
    • Fine-tune embeddings and prompts
  • Best practices for building RAG systems:

    • Create proper evaluation datasets
    • Test models in isolation (retrieval, synthesis)
    • Monitor both qualitative and quantitative metrics
    • Consider latency, cost, and safety requirements
    • Start simple before adding complexity
  • Advanced RAG architectures can include:

    • Multi-document agents
    • Hierarchical retrieval
    • Query decomposition
    • Structured data integration
    • Tool-based augmentation
  • Agents can enhance RAG by:

    • Breaking complex queries into sub-questions
    • Managing conversation state
    • Combining multiple tools and data sources
    • Providing structured outputs
    • Handling workflow automation
  • Evaluation should consider:

    • Retrieval accuracy
    • Response relevance
    • Response faithfulness
    • End-to-end performance metrics
    • Cost and latency tradeoffs
  • Production considerations:

    • Vector database selection
    • Embedding model choice
    • Prompt engineering
    • Monitoring and observability
    • Error handling and fallbacks
  • Model selection impacts:

    • Cost vs. performance tradeoffs
    • Capability differences between models
    • Open source vs. proprietary options
    • Context window limitations
    • Structured output quality
  • Fine-tuning opportunities exist for:

    • Embedding models
    • Response synthesis
    • Query generation
    • Ranking algorithms
    • Domain adaptation