Juan Luis Cano Rodríguez - Who needs ChatGPT? Rock solid AI pipelines with Hugging Face and Kedro

Learn how to build rock-solid AI pipelines using Hugging Face and Kedro, a Python framework for data pipelines that integrates seamlessly with various data storage and processing systems.

Key takeaways
  • Kedro is a Python framework for data pipelines that can easily integrate with various data storage and processing systems.
  • FillMaskModel is a model that can fill in missing data, and the author uses it to create a summarizer that summarizes a list of posts.
  • The author also uses Hugging Face transformers and the PyData ecosystem to build the summarizer.
  • Kedro has a wide range of integrations, including Airflow, Argo, Dask, Kubeflow, and Prefect, which can make it easier to create and manage data pipelines.
  • The author uses the Kedro CLI to define a pipeline that includes a FillMaskModel node, which takes in a list of posts and returns a summarized version of them.
  • The author also uses the Kedro CLI to create a Kedro dataset and populate it with data from a MinIO bucket.
  • Kedro can be used to handle large-scale data processing and can be scaled horizontally, making it suitable for production use cases.
  • The author highlights the importance of proper data processing and summarizes the key steps to follow when building a data pipeline with Kedro.
  • Kedro has a strong focus on software engineering principles and is designed to make it easy to build and maintain large-scale data pipelines.
  • The author uses the Kedro visualizer to visualize the data pipeline and see the flow of data through it.