Elijah ben Izzy & Stefan Krawczyk - Bridging Classic ML Pipelines with the World of LLMs

Learn how to bridge classic ML and LLM pipelines using DAGs and Hamilton framework. Discover shared patterns, key differences, and best practices for building unified ML systems.

Key takeaways
  • DAGs (Directed Acyclic Graphs) are a crucial abstraction that can effectively model both traditional ML and LLM pipelines

  • Hamilton is a micro-orchestration framework that allows defining DAGs using declarative functions, enabling modular, testable, and self-documenting pipelines

  • LLM and classic ML pipelines share similar structural patterns and engineering challenges:

    • Both require observability, evaluation, and productionization
    • Both can be modeled as DAGs of computational steps
    • Both need proper versioning and testing
  • Key differences between LLM and classic ML pipelines:

    • LLMs typically require GPUs for serving
    • LLMs have less feature engineering but more focus on prompt engineering
    • LLM evaluation tends to be fuzzier due to text-based outputs
  • Benefits of using Hamilton for ML/LLM pipelines:

    • Easy swapping between components and implementations
    • Built-in testing and debugging capabilities
    • Code reusability and modularity
    • Support for both batch and online implementations
    • Integration with existing tools (Airflow, Metaflow, etc.)
  • The framework emphasizes software engineering best practices:

    • Self-contained, modular components
    • Clear dependency management
    • Easy testing and debugging
    • Documentation through code structure
  • Pipelines can be built combining both ML and LLM components using config.when decorator for flexible switching between implementations

  • Prompts in LLM pipelines can be treated similar to hyperparameters in traditional ML pipelines

  • Vector databases and embedding operations work similarly for both LLM and classic ML use cases

  • The field requires tools that can handle rapid changes and iterations, especially in the LLM space