Emeli Dral - More like this: monitoring recommender systems in production | PyData Global 2023

Learn effective strategies for monitoring recommender systems in production, from business KPIs to data quality metrics, with practical tips for A/B testing & issue detection.

Key takeaways
  • Start with business KPIs (revenue, conversions) as the primary metrics for monitoring recommender systems in production

  • Implement multiple layers of monitoring:

    • Service health metrics (memory usage, response time)
    • Data quality and drift monitoring
    • Online quality metrics (clicks, views)
    • Offline proxy metrics (precision@k, recall@k)
    • Beyond accuracy metrics (diversity, novelty, serendipity)
  • Use proxy metrics for faster issue detection since business metrics like revenue can have delayed feedback

  • Consider attribution challenges when measuring recommender system impact, especially for long purchase cycles

  • Implement A/B testing to properly measure economic impact by comparing user segments with/without recommendations

  • Monitor data quality and drift closely as data pipeline issues can significantly impact model performance

  • Create synthetic “avatar” users with specific properties for testing and debugging recommendation behavior

  • Calculate metrics in batch mode to analyze trends and compare against reference data

  • Establish correlation between offline metrics and online business metrics for better model selection

  • Use tools like multi-arm bandits to occasionally serve random recommendations to avoid feedback loops