We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Wojtek Kuberski - The ML Monitoring Flow for Models Deployed to Production | PyData Amsterdam 2024
Learn how to effectively monitor ML models in production, detect performance degradation, and implement best practices for model maintenance and selective retraining.
-
Models deployed to production commonly experience performance degradation over time, with studies showing ~20% average degradation and some models degrading >80%
-
Two main types of model drift:
- Covariate shift: Changes in input data distribution
- Concept drift: Changes in relationship between features and target
-
Traditional data drift detection methods have limitations:
- High false positive rates
- Cannot reliably indicate actual model performance impact
- Univariate drift methods miss important multivariate changes
-
Key monitoring approaches:
- Confidence-Based Performance Estimation (CBPE) for classification tasks
- Direct Loss Estimation (PAPE) for regression tasks
- Model calibration to get reliable probability estimates
- Estimating performance metrics without access to ground truth labels
-
Best practices for production ML monitoring:
- Don’t rely solely on data drift signals
- Consider business impact and costs of false positives/negatives
- Monitor performance across different data segments
- Set up early warning systems before business impact occurs
- Account for seasonality in monitoring metrics
-
Model retraining considerations:
- Retrain selectively based on detected concept drift
- Retraining may not help if issue is pure covariate shift
- Focus retraining on specific data segments showing degradation
- Validate retraining impact with proper performance metrics