Select ML from Databases [PyCon DE & PyData Berlin 2024]

Learn how to integrate machine learning directly into databases, from built-in ML modules to custom implementations. Explore benefits, challenges & real-world use cases.

Key takeaways
  • ML functionality is increasingly being integrated directly into databases, allowing analysis and model training on data where it lives

  • Three main approaches for ML in databases:

    • Built-in ML modules (limited flexibility but easy to use)
    • Third-party integrations (medium flexibility)
    • Custom ML methods (full flexibility but more complex)
  • Benefits of database-integrated ML:

    • No ETL jobs needed
    • Reduced infrastructure complexity
    • Better security (data doesn’t leave database)
    • Simplified deployment without extra microservices
    • Individual service scaling capabilities
  • Key considerations for implementation:

    • Model development can be done offline/locally
    • Models need to be packaged with dependencies
    • SQL queries can be used for model inference
    • Performance monitoring at query level is important
    • User-defined functions (UDFs) enable custom ML integration
  • Common use cases:

    • Churn prediction
    • Insurance quote estimation
    • Anomaly detection
    • Time series analysis
    • Personalized recommendations
  • Limitations and challenges:

    • Built-in models may not fit all use cases
    • Need to balance flexibility vs ease of use
    • Model explainability varies by approach
    • Performance impact on database operations
    • Model lifecycle management requirements