Datta & Rodríguez - Building the composable Python data stack with Kedro & Ibis | PyData London 2024

Build a composable Python data stack with Kedro, a data pipeline framework, and Ibis, a query engine, for efficient data processing and flexible pipeline reuse.

Key takeaways
  • Kedro is a Python framework for building data pipelines that integrates with Ibis for querying data.
  • The goal is to process data with Ibis and create a data pipeline with Kedro.
  • Kedro can connect to various backends, including DuckDB, Postgres, and more.
  • A key feature of Kedro is that it extracts the data processing logic from the Python code, making it easier to read and modify the data pipeline.
  • The catalog is a central configuration file that defines the datasets and their relationships.
  • Kedro provides a way to create a model input table by consolidating multiple datasets.
  • Ibis provides support for various operations, including filter, group by, aggregation, and sort.
  • Ibis also supports joins and allows for chaining of operations.
  • Kedro provides an easy way to create a pipeline with a sequence of nodes.
  • The pipeline can be reused across multiple backends, making it a flexible solution for data processing.
  • The catalog can be easily extended to include more datasets and relationships.
  • Ibis provides advanced features like support for UDFs and data type conversion.
  • The speaker emphasizes that Kedro is designed to be extensible and flexible, and that it can be used with various databases and data processing backends.