We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Moving from Offline to Online Machine Learning with River [PyCon DE & PyData Berlin 2024]
Learn best practices for building documentation search with RAG, including chunking strategies, cost optimization, and evaluation. See real examples from scikit-learn.
-
RAG (Retrieval Augmented Generation) can be used to create better documentation search systems, combining traditional search with LLMs
-
Chunking strategy is critical for RAG success - different document types require different chunking approaches:
- API documentation needs parameter-level chunking
- User guides work better with section-level chunks
- Must avoid cutting sections in the middle
-
Cost and performance considerations:
- CPU-only solutions are much slower but cheaper than GPU
- BM25 (lexical search) is 100x faster than BERT-based retrieval
- Consider costs of API calls and GPU resources at scale
-
Multi-stage retrieval improves results:
- Use BM25 for initial document retrieval
- Re-rank results using BERT/transformers
- Pass relevant chunks to LLM for final answer
-
Local deployment benefits:
- More control over the stack
- Lower costs than cloud APIs
- Better for prototyping and development
-
Evaluation and feedback are challenging for open source projects:
- Hard to collect user interaction data
- No clear metrics for success
- Need to manually check failure cases
-
Important to validate and verify LLM outputs:
- LLMs can hallucinate incorrect information
- Context/retrieval quality directly impacts answer quality
- Consider using zero-shot prompting
-
For Scikit-learn specifically:
- Different documentation types need different handling
- API docs, user guides, and examples require unique approaches
- Structure from NumPyDoc helps with parsing