Ionut Barbu & Tim Brakenhoff - How to build production-ready data science pipelines with Kedro

Learn how to build production-ready data science pipelines using Kedro, a framework that provides structure, organization, and customization options for data projects.

Key takeaways
  • Conferences and workshops in data science can be helpful in improving code quality and organization.
  • Kedro provides structure and organization for data science projects.
  • Predefined places for storing data, such as CSV, Excel, and Parquet, are built into Kedro.
  • Kedro uses a DAG (Directed Acyclic Graph) to visualize pipelines.
  • Kedro can be used to create a reproducible, maintainable, and modular data science pipeline.
  • It is essential to have a data catalog to store data and track changes.
  • Integrate with MLflow for versioning and tracking machine learning models.
  • Use YAML files for configuration and storing data.
  • CADRO provides a modular package for building data science pipelines.
  • Kedro can be used with multiple deployment options, such as AWS, Azure, and manual.
  • Add value to a project by using Kedro to create a structure and organization.
  • Use Kedro to make changes to the code and for managing intermediate data sets.
  • CADRO provides a tool for stitching together multiple nodes in a pipeline.
  • Kedro can be used to create a structured data catalog.
  • Use Kedro to create high-quality code and useful data engineering pipelines.