LLMs gone wild - Tess Ferrandez-Norlander - NDC Oslo 2024

Testing

Learn about key components, challenges, and best practices for RAG systems with LLMs, including data preparation, accuracy metrics, and practical implementation tips.

Key takeaways

RAG (Retrieval-Augmented Generation) systems are currently ~75% of LLM applications, helping ground LLM outputs in factual data
Key components of successful RAG implementations:
- Proper data preparation and chunking
- Selection of appropriate embedding models
- Effective metadata filtering
- Guardrails against hallucinations and sensitive data
- Evaluation frameworks for accuracy
LLMs have varying accuracy rates:
- Generation accuracy: 60-85%
- Retrieval accuracy: 60-85%
- Combined RAG system accuracy: ~72%
Critical optimization areas:
- Data ingestion and preparation
- Chunk size and strategy
- Context window management
- Prompt engineering
- Re-ranking of results
Common challenges:
- Handling sensitive/PII data
- Maintaining data freshness
- Dealing with multi-modal content (images, tables)
- Managing context windows
- Preventing hallucinations
Key evaluation metrics:
- Context precision
- Context recall
- Answer relevancy
- Factual accuracy
Best practices:
- Implement proper data filtering
- Use metadata enrichment
- Test with real users
- Add guardrails for sensitive use cases
- Monitor and evaluate system performance
Tools and frameworks:
- LangChain
- LlamaIndex
- Semantic Kernel
- Various vector databases
- Evaluation frameworks like Ragas

LLMs gone wild - Tess Ferrandez-Norlander - NDC Oslo 2024

More talks