Vino Duraisamy - From raw data to interactive data app in an hour | PyData Global 2023

Learn how to build end-to-end ML workflows in Snowflake using Snowpark's Python capabilities, from data processing to model deployment and interactive visualization.

Key takeaways
  • Snowpark enables end-to-end ML workflows within Snowflake by allowing Python, Java and Scala code execution alongside SQL

  • Key components include DataFrame API for data engineering and ML Modeling API for machine learning tasks, removing need for external compute environments

  • Model Registry provides versioning and deployment capabilities for ML models, with metadata tracking and easy deployment options

  • Zero-copy cloning allows multiple teams to work on the same data without creating redundant copies, maintaining data governance

  • Pre-processing and feature engineering can be done using familiar APIs similar to scikit-learn (ordinal encoding, one-hot encoding, scaling etc.)

  • Supports popular ML libraries like scikit-learn, XGBoost while optimizing execution through Snowflake’s compute engine

  • Eliminates data silos by keeping entire workflow within Snowflake’s security boundary instead of moving data between environments

  • Model deployment is simplified to a two-step process: logging the model and deploying it for inference

  • Provides standardized environments for both development and production through container services and GPU-enabled compute options

  • Interactive visualization apps can be built using Streamlit integration for model monitoring and results display