Dean Pleban - Customizing and Evaluating LLMs, an Ops Perspective | PyData Global 2023

Discover the operational perspectives on customizing and evaluating Large Language Models, including strategies like prompt engineering, RAG, LAURA, and PEFT, as well as best practices for fine-tuning, retraining, and evaluating these powerful AI models.

Key takeaways
  • Customizing LLMs is essential for achieving high-stakes goals, such as medical diagnoses and legal cases.
  • There are various strategies for customization, including prompt engineering, reference and preference-based methods, and RAG (retrieval augmented generation).
  • RAG can update the model context more simply than updating the model itself.
  • LAURA (Large-scale Adaptation) and PEFT (Parameter Efficient Fine Tuning) are often overlooked, but powerful methods for customization.
  • Fine tuning and retraining involve comprehensive fine-tuning and retraining on new data.
  • Human judgment and feedback are essential for evaluating LLMs.
  • RAG integrates well with other customization techniques.
  • Customization is needed for both industry teams and smaller startups.
  • Metrics, such as LoRa, PEF, and RAG, are essential for evaluating LLMs.
  • It is crucial to ensure that the evaluation data is representative, free from biases, and covers necessary dimensions.
  • Output validation and collecting feedback from production are also important.
  • Customizing LLMs requires expertise in prompting, reference and preference-based methods, and RAG.
  • Fine tuning and retraining require comprehensive fine-tuning and retraining on new data.
  • Open-source libraries, such as LAURA and PEF, can provide a ton of metrics out of the box and scaffolding to help with customization.
  • Tooling is essential for evaluating LLMs.