Moving from Offline to Online Machine Learning with River [PyCon DE & PyData Berlin 2024]

Learn best practices for building documentation search with RAG, including chunking strategies, cost optimization, and evaluation. See real examples from scikit-learn.

Key takeaways
  • RAG (Retrieval Augmented Generation) can be used to create better documentation search systems, combining traditional search with LLMs

  • Chunking strategy is critical for RAG success - different document types require different chunking approaches:

    • API documentation needs parameter-level chunking
    • User guides work better with section-level chunks
    • Must avoid cutting sections in the middle
  • Cost and performance considerations:

    • CPU-only solutions are much slower but cheaper than GPU
    • BM25 (lexical search) is 100x faster than BERT-based retrieval
    • Consider costs of API calls and GPU resources at scale
  • Multi-stage retrieval improves results:

    • Use BM25 for initial document retrieval
    • Re-rank results using BERT/transformers
    • Pass relevant chunks to LLM for final answer
  • Local deployment benefits:

    • More control over the stack
    • Lower costs than cloud APIs
    • Better for prototyping and development
  • Evaluation and feedback are challenging for open source projects:

    • Hard to collect user interaction data
    • No clear metrics for success
    • Need to manually check failure cases
  • Important to validate and verify LLM outputs:

    • LLMs can hallucinate incorrect information
    • Context/retrieval quality directly impacts answer quality
    • Consider using zero-shot prompting
  • For Scikit-learn specifically:

    • Different documentation types need different handling
    • API docs, user guides, and examples require unique approaches
    • Structure from NumPyDoc helps with parsing