Jim Dowling - From zero to a working ML system with Python, free serverless services + FTI pipelines

Build a machine learning system from scratch with Python, free serverless services, and feature pipelines, covering data ingestion, model training, and deployment.

Key takeaways
  • The speaker emphasizes that creating a machine learning system with Python, free serverless services, and FTI pipelines requires minimal setup and can be built quickly.
  • The feature store is a critical component that hides complexity from users and provides a unified interface to manage features, models, and data.
  • The speaker uses the example of air quality prediction to demonstrate how to build a feature pipeline, train a model, and deploy it as a minimal viable prediction service.
  • The pipeline consists of multiple stages, including data ingestion, feature engineering, training a model, and making predictions.
  • The speaker uses free weather API and some static data to train the model and make predictions for air quality in different cities.
  • The feature store allows users to manage features, models, and data versioning, including backfill pipelines for historical data.
  • The speaker emphasizes the importance of testing and validation in machine learning, including feature monitoring and model validation.
  • He also highlights the need for data engineers to play a key role in building end-to-end ML systems.
  • The free version of Hopsworks provides unlimited storage and processing power, while the paid version provides additional features like compute and support for larger models.
  • The speaker mentions that Modal provides a generous free tier for compute, but Hopsworks does not currently offer compute for free.