Talks - Jodie Burchell: Lies, damned lies and large language models

Explore how LLMs compress data, leading to hallucinations, and learn practical strategies to improve accuracy. Covers RAG, prompt engineering, and evaluation methods.

Key takeaways

LLMs are essentially doing “lossy compression” of their training data, leading to information loss and potential hallucinations
Two main types of hallucinations:
- Faithfulness hallucinations: model deviates from given context
- Factuality hallucinations: model generates incorrect facts
Common data quality issues contributing to hallucinations:
- Training data containing misinformation and conspiracy theories
- Low quality sources
- Inadequately filtered web content
- Outdated information
Key methods to reduce hallucinations:
- Retrieval Augmented Generation (RAG)
- Better prompt engineering
- Domain-specific datasets
- Self-refinement and collaborative refinement
- Improved data filtering
RAG implementation considerations:
- Document chunk size
- Choice of embedding model
- Retrieval method
- Vector database selection
- Prompt construction
Model size trends:
- GPT-1: 120 million parameters
- GPT-3: 175 billion parameters
- GPT-4: 1 trillion parameters
- Larger models can encode more information but remain prone to hallucinations
Measuring hallucination rates:
- TruthfulQA dataset for factuality
- HaluEvalQA for faithfulness
- Multiple choice vs. open-ended evaluation
- Need for domain-specific evaluation metrics

Talks - Jodie Burchell: Lies, damned lies and large language models

More talks