Andrey Cheptsov - Leveraging open-source LLMs for production | PyData Global 2023

Learn how to leverage open-source LLMs in production with guidance on model selection, optimization techniques, fine-tuning approaches, and deployment considerations.

Key takeaways
  • Open-source LLMs offer full control over model behavior, data processing, and privacy compared to proprietary models

  • Key advantages of open-source LLMs:

    • Cost reduction potential through optimization
    • Customization flexibility
    • No vendor lock-in risk
    • Full privacy control
    • Community-driven improvements
  • Major open-source LLM models:

    • LLaMA 2 (Meta/Microsoft) - available in 7B, 13B, 70B parameters
    • Code LLaMA - specialized for code generation
    • Mistral - efficient 7B model with strong performance
  • Technical optimization techniques:

    • LoRA (Low Rank Adaptation) reduces memory requirements for training
    • Quantization converts weights to lower precision formats (int8, int4)
    • Combining LoRA and quantization can reduce memory needs by 10x
  • Fine-tuning approaches:

    • Supervised fine-tuning for basic task adaptation
    • RLHF (Reinforcement Learning from Human Feedback) for instruction following
    • DPO (Direct Preference Optimization) as simpler alternative to RLHF
  • Production deployment considerations:

    • Memory requirements vary significantly by model size
    • Batch inference can improve latency
    • Multiple models can be served using shared resources
    • Commercial usage rights vary by model license
  • Benchmarks show open-source models approaching GPT-3.5/GPT-4 quality on specific tasks through fine-tuning

  • Common use cases:

    • Code generation
    • SQL query generation
    • Structured data extraction
    • API automation
    • Synthetic dataset generation