Ionut Barbu & Tim Brakenhoff - How to build production-ready data science pipelines with Kedro

Learn how to build production-ready data science pipelines using Kedro, a framework that provides structure, organization, and customization options for data projects.

Key takeaways

Conferences and workshops in data science can be helpful in improving code quality and organization.
Kedro provides structure and organization for data science projects.
Predefined places for storing data, such as CSV, Excel, and Parquet, are built into Kedro.
Kedro uses a DAG (Directed Acyclic Graph) to visualize pipelines.
Kedro can be used to create a reproducible, maintainable, and modular data science pipeline.
It is essential to have a data catalog to store data and track changes.
Integrate with MLflow for versioning and tracking machine learning models.
Use YAML files for configuration and storing data.
CADRO provides a modular package for building data science pipelines.
Kedro can be used with multiple deployment options, such as AWS, Azure, and manual.
Add value to a project by using Kedro to create a structure and organization.
Use Kedro to make changes to the code and for managing intermediate data sets.
CADRO provides a tool for stitching together multiple nodes in a pipeline.
Kedro can be used to create a structured data catalog.
Use Kedro to create high-quality code and useful data engineering pipelines.

Ionut Barbu & Tim Brakenhoff - How to build production-ready data science pipelines with Kedro

More talks