We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Lies, damned lies and large language models — Jodie Burchell
Explore types of LLM hallucinations, their evolution through GPT models, and practical methods to reduce false outputs. Learn to measure and mitigate AI inaccuracies.
-
Two main types of LLM hallucinations exist:
- Faithfulness hallucinations - deviating from source text/context
- Factuality hallucinations - generating incorrect factual information
-
GPT model evolution shows increasing capabilities:
- GPT-1 (120M parameters): Basic grammar
- GPT-2: More sophisticated text completion
- GPT-3+: Ability to encode knowledge and generate coherent content
-
Training data quality significantly impacts hallucination rates:
- Early models relied heavily on unfiltered CommonCrawl data
- Modern approaches use filtered sources (C4, Refined Web)
- Higher quality input data generally leads to better performance
-
Methods to reduce hallucinations include:
- Careful prompt engineering
- Fine-tuning on specific domains
- Retrieval Augmented Generation (RAG)
- Self-refinement and collaborative refinement
- Using multiple models to cross-validate outputs
-
Measuring hallucination rates:
- Multiple evaluation datasets exist (TruthfulQA, HALU eval, SQuAD)
- TruthfulQA specifically tests for common misconceptions
- Current models still show significant hallucination rates (~30-40%)
- Measurement methods need to be specific to use case and domain
-
Large context windows help reduce inconsistencies but don’t eliminate hallucinations
-
Trade-offs exist between model size, performance, and hallucination rates
-
Critical evaluation needed when assessing model performance claims