We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
A Retrieval Augmented Generation system to query the scikit-learn documentation
Learn how to build a documentation search system using RAG, sentence-BERT, and Mistral LLM to improve scikit-learn's docs searchability while keeping costs low.
-
RAG (Retrieval Augmented Generation) system was built to improve searchability of scikit-learn documentation by combining text retrieval with LLM-based responses
-
Different documentation types (API docs, user guides, tutorials) require different chunking strategies to effectively break down content for retrieval
-
The system uses sentence-BERT for encoding text and BM25 for retrieval, with a BERT-based re-ranker to improve result quality
-
Mistral was chosen as the LLM for being able to run locally and having reasonable performance on CPU/M2 hardware
-
Proper chunking strategy is critical - naive chunking often fails and needs to be adapted based on document structure and content type
-
Performance considerations: retrieval takes ~3 seconds on CPU, total response time around 5 seconds including LLM generation
-
Cost was a major factor - cloud API calls would be prohibitively expensive at scikit-learn’s scale of ~1M monthly documentation users
-
System lacks evaluation metrics due to no user interaction data being collected for privacy/GDPR reasons
-
The implementation focuses on pure Python code without complex dependencies for better maintainability
-
Current limitations include inability to process images in documentation and potential hallucinations from the LLM when context is ambiguous