Datta & Rodríguez - Building the composable Python data stack with Kedro & Ibis | PyData London 2024

Python

Build a composable Python data stack with Kedro, a data pipeline framework, and Ibis, a query engine, for efficient data processing and flexible pipeline reuse.

Key takeaways

Kedro is a Python framework for building data pipelines that integrates with Ibis for querying data.
The goal is to process data with Ibis and create a data pipeline with Kedro.
Kedro can connect to various backends, including DuckDB, Postgres, and more.
A key feature of Kedro is that it extracts the data processing logic from the Python code, making it easier to read and modify the data pipeline.
The catalog is a central configuration file that defines the datasets and their relationships.
Kedro provides a way to create a model input table by consolidating multiple datasets.
Ibis provides support for various operations, including filter, group by, aggregation, and sort.
Ibis also supports joins and allows for chaining of operations.
Kedro provides an easy way to create a pipeline with a sequence of nodes.
The pipeline can be reused across multiple backends, making it a flexible solution for data processing.
The catalog can be easily extended to include more datasets and relationships.
Ibis provides advanced features like support for UDFs and data type conversion.
The speaker emphasizes that Kedro is designed to be extensible and flexible, and that it can be used with various databases and data processing backends.

Datta & Rodríguez - Building the composable Python data stack with Kedro & Ibis | PyData London 2024

More talks