Crafting your own RAG system: Leveraging 30+ LLMs for enhanced performance by Stephan Janssen

Learn how to craft your own RAG system using 30+ LLMs, enhance performance through careful selection and implementation, and explore re-ranking, query expansion, and answer generation techniques.

Key takeaways

Crafting your own RAG (Relevance Aware Retrieval) system is possible using 30+ LLMs (Large Language Models).
Leveraging LLMs can enhance performance, but requires careful selection and implementation.
LLMs can be used for re-ranking, query expansion, and answer generation.
Choosing the right embedding is crucial for semantic search, with options including local, open source, and proprietary models.
Embeddings can be used to convert text into numerical representations, allowing for efficient querying and ranking.
RAG systems can be trained using various techniques, including supervised learning, unsupervised learning, and reinforcement learning.
Local models can be used to generate responses, with options including Ollama, LM Studio, and GPT.
Deploying a RAG system requires consideration of scalability, latency, and cost.
Using local models can reduce latency and cost, but may require more development and maintenance effort.
Visualizing embeddings can help with understanding and debugging the system.
Embeddings can be used to improve search relevance, with options including BM25, DSSM, and Co-Attention.
Quantizing embeddings can reduce storage requirements and improve inference speed.
Open source libraries like LangChain4j can be used to simplify implementation and development.

Crafting your own RAG system: Leveraging 30+ LLMs for enhanced performance by Stephan Janssen

More talks