Juan Luis Cano Rodríguez - Who needs ChatGPT? Rock solid AI pipelines with Hugging Face and Kedro

Learn how to build production-ready ML pipelines by combining Kedro's software engineering best practices with Hugging Face's state-of-the-art AI models and tools.

Key takeaways
  • Kedro is an open source framework that applies software engineering best practices to data science and ML pipelines, helping transition from experiments to production

  • The framework decouples I/O operations from computation by separating datasets (inputs/outputs) from nodes (computation steps), making pipelines more maintainable

  • Kedro projects follow a standardized template structure, with clean separation between configuration, data, notebooks and source code

  • The data catalog provides a declarative way to define datasets and their locations (local, S3, databases etc.), abstracting away data access details

  • Pipelines are defined as directed acyclic graphs (DAGs) of nodes, with clear dependencies between computation steps

  • Kedro integrates with major orchestration platforms like Airflow, Argo, Kubeflow while remaining orchestrator-agnostic

  • The framework supports experiment tracking and can connect with MLflow through plugins

  • Kedro is extensible through hooks and plugins, following similar patterns to tools like PyTest

  • While not a full MLOps solution, Kedro focuses on providing solid foundations for building maintainable ML pipelines

  • The project is now part of Linux Foundation AI & Data with multiple stakeholders including McKinsey, Societe General and others