Tam-Sanh Nguyen - Writing and Scaling Collaborative Data Pipelines with Kedro

Learn how to write and scale collaborative data pipelines with Kedro, a framework that simplifies and standardizes data pipelines, making them more maintainable and scalable.

Key takeaways
  • Data pipelines start small, but can grow to become complex and challenging to maintain.
  • Kedro is a framework that helps to simplify and standardize data pipelines, making them more maintainable and scalable.
  • Kedro provides a configuration system that allows for easy modification of pipeline behavior.
  • Pipelines can be complex and difficult to visualize, but Kedro provides tools to help with this.
  • Data pipelines require a balance between data engineering and data science, and Kedro helps to facilitate this.
  • Kedro allows for easy deployment and sharing of pipelines, and provides mechanisms for tracking and managing pipeline versions.
  • Kedro is designed to be extensible, and is compatible with other tools and technologies such as React and Airflow.
  • Data pipelines are often difficult to maintain because they lack standardized organization and structure, and Kedro addresses this.
  • Kedro provides a standardized way of organizing and naming pipeline components, making it easier to understand and work with complex pipelines.
  • Data pipelines can be viewed as a form of “altruistic programming”, where the code is written with the intention of making it easy for others to understand and extend.
  • Kedro aims to promote a culture of collaboration and standardized practices in the field of data engineering and science.
  • Data pipelines can be thought of as a form of “audio engineering”, where the goal is to extract meaningful insights from raw data, and Kedro provides tools to help with this.
  • Kedro is designed to be flexible and adaptable, allowing it to be used in a variety of different contexts and applications.