Dean Pleban - Customizing and Evaluating LLMs, an Ops Perspective | PyData Global 2023

Discover the operational perspectives on customizing and evaluating Large Language Models, including strategies like prompt engineering, RAG, LAURA, and PEFT, as well as best practices for fine-tuning, retraining, and evaluating these powerful AI models.

Key takeaways

Customizing LLMs is essential for achieving high-stakes goals, such as medical diagnoses and legal cases.
There are various strategies for customization, including prompt engineering, reference and preference-based methods, and RAG (retrieval augmented generation).
RAG can update the model context more simply than updating the model itself.
LAURA (Large-scale Adaptation) and PEFT (Parameter Efficient Fine Tuning) are often overlooked, but powerful methods for customization.
Fine tuning and retraining involve comprehensive fine-tuning and retraining on new data.
Human judgment and feedback are essential for evaluating LLMs.
RAG integrates well with other customization techniques.
Customization is needed for both industry teams and smaller startups.
Metrics, such as LoRa, PEF, and RAG, are essential for evaluating LLMs.
It is crucial to ensure that the evaluation data is representative, free from biases, and covers necessary dimensions.
Output validation and collecting feedback from production are also important.
Customizing LLMs requires expertise in prompting, reference and preference-based methods, and RAG.
Fine tuning and retraining require comprehensive fine-tuning and retraining on new data.
Open-source libraries, such as LAURA and PEF, can provide a ton of metrics out of the box and scaffolding to help with customization.
Tooling is essential for evaluating LLMs.

Dean Pleban - Customizing and Evaluating LLMs, an Ops Perspective | PyData Global 2023

More talks