Max Pumperla - Building & Deploying LLM Apps

Discover how to build and deploy large language models (LLMs) using Ray and open-source models, covering topics such as fine-tuning, scaling, and retrieval augmented generation.

Key takeaways

Large language models (LLMs) are complex systems that require careful consideration for deployment and scaling.
Ray is a flexible distributed Python framework that allows for easy scaling of workloads.
Open-source models are becoming stronger and can be used for fine-tuning and customizing LLMs.
Documentor is a GitHub bot that uses LLMs to improve writing and can be used for tasks such as code generation and documentation.
Fine-tuning LLMs can be useful for specific use cases, but may not always be necessary.
Ray’s primitives allow for easy scaling of Python code and can be used for tasks such as hyperparameter tuning and optimization.
Vector databases can be used to store and retrieve text data for use with LLMs.
Retrieval augmented generation (RAG) is a technique that combines retrieval and generation for improved results.
LLMs can be used for a variety of tasks, including code generation, documentation, and summarization.
Scaling LLMs can be complex and requires careful consideration of factors such as cost, speed, and quality.
OpenAI’s GPT-3.5 Turbo and Llama 2 models are similar in many respects and can be used for different tasks.
Vector databases can be used to store and retrieve text data for use with LLMs.
LLMs can be used for a variety of tasks, including code generation, documentation, and summarization.
Retrieval augmented generation (RAG) is a technique that combines retrieval and generation for improved results.
Ray’s primitives allow for easy scaling of Python code and can be used for tasks such as hyperparameter tuning and optimization.
LLMs can be used for a variety of tasks, including code generation, documentation, and summarization.
Fine-tuning LLMs can be useful for specific use cases, but may not always be necessary.
Open-source models are becoming stronger and can be used for fine-tuning and customizing LLMs.
Documentor is a GitHub bot that uses LLMs to improve writing and can be used for tasks such as code generation and documentation.
LLMs can be used for a variety of tasks, including code generation, documentation, and summarization.
Retrieval augmented generation (RAG) is a technique that combines retrieval and generation for improved results.
Ray’s primitives allow for easy scaling of Python code and can be used for tasks such as hyperparameter tuning and optimization.

Max Pumperla - Building & Deploying LLM Apps

More talks