Talks - Jodie Burchell: Lies, damned lies and large language models

Explore how LLMs compress data, leading to hallucinations, and learn practical strategies to improve accuracy. Covers RAG, prompt engineering, and evaluation methods.

Key takeaways
  • LLMs are essentially doing “lossy compression” of their training data, leading to information loss and potential hallucinations

  • Two main types of hallucinations:

    • Faithfulness hallucinations: model deviates from given context
    • Factuality hallucinations: model generates incorrect facts
  • Common data quality issues contributing to hallucinations:

    • Training data containing misinformation and conspiracy theories
    • Low quality sources
    • Inadequately filtered web content
    • Outdated information
  • Key methods to reduce hallucinations:

    • Retrieval Augmented Generation (RAG)
    • Better prompt engineering
    • Domain-specific datasets
    • Self-refinement and collaborative refinement
    • Improved data filtering
  • RAG implementation considerations:

    • Document chunk size
    • Choice of embedding model
    • Retrieval method
    • Vector database selection
    • Prompt construction
  • Model size trends:

    • GPT-1: 120 million parameters
    • GPT-3: 175 billion parameters
    • GPT-4: 1 trillion parameters
    • Larger models can encode more information but remain prone to hallucinations
  • Measuring hallucination rates:

    • TruthfulQA dataset for factuality
    • HaluEvalQA for faithfulness
    • Multiple choice vs. open-ended evaluation
    • Need for domain-specific evaluation metrics