Patrick Deziel & Prema Roman - Reconceptualizing Machine Learning for the Real-time World

Learn how real-time machine learning overcomes traditional batch processing limitations, with insights on streaming architectures, online learning, and MLOps best practices.

Key takeaways
  • 87% of data science projects never make it to production due to challenges with MLOps and deployment

  • Traditional batch machine learning faces limitations:

    • Models become outdated quickly
    • Long iteration cycles for updates
    • High latency between data ingestion and model updates
    • Memory issues with large datasets
  • Real-time/online learning advantages:

    • Continuous model updates as new data arrives
    • Faster training cycles (learning one record at a time)
    • More responsive to changing user behaviors
    • Lower latency between data receipt and model updates
  • Ensign platform features:

    • Managed streaming service for data scientists
    • Publish/subscribe model for data streams
    • Historical data access and querying
    • Python and Go SDK support
    • Built specifically for ML/data science workflows
  • Asynchronous data science approach:

    • Replaces batch processing with continuous data streams
    • Enables concurrent processing of different components
    • Separates data ingestion from model training
    • Makes deployment and updates more efficient
  • Best use cases for real-time ML:

    • Recommendation systems
    • Anomaly detection
    • Personalized models
    • IoT applications
    • Systems requiring immediate feedback loops