A Retrieval Augmented Generation system to query the scikit-learn documentation

Python Ai

Learn how to build a documentation search system using RAG, sentence-BERT, and Mistral LLM to improve scikit-learn's docs searchability while keeping costs low.

Key takeaways

RAG (Retrieval Augmented Generation) system was built to improve searchability of scikit-learn documentation by combining text retrieval with LLM-based responses
Different documentation types (API docs, user guides, tutorials) require different chunking strategies to effectively break down content for retrieval
The system uses sentence-BERT for encoding text and BM25 for retrieval, with a BERT-based re-ranker to improve result quality
Mistral was chosen as the LLM for being able to run locally and having reasonable performance on CPU/M2 hardware
Proper chunking strategy is critical - naive chunking often fails and needs to be adapted based on document structure and content type
Performance considerations: retrieval takes ~3 seconds on CPU, total response time around 5 seconds including LLM generation
Cost was a major factor - cloud API calls would be prohibitively expensive at scikit-learn’s scale of ~1M monthly documentation users
System lacks evaluation metrics due to no user interaction data being collected for privacy/GDPR reasons
The implementation focuses on pure Python code without complex dependencies for better maintainability
Current limitations include inability to process images in documentation and potential hallucinations from the LLM when context is ambiguous

A Retrieval Augmented Generation system to query the scikit-learn documentation

More talks