We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Eitan Netzer & Oren Netzer - Real Time Machine Learning | PyData Global 2023
Learn how Core Sets and the Data Heroes Framework enable efficient real-time ML with automated retraining, distributed processing, and massive compute cost savings.
-
Core sets provide a weighted subset of data that preserves statistical properties while significantly reducing training time and compute costs
-
The Data Heroes Real-Time ML Framework enables:
- Automated high-frequency model retraining
- Training on multiple date ranges
- Efficient hyperparameter tuning
- Distributed processing across geographic locations
-
Model retraining frequency improvements:
- Monthly retraining saved 82% compute costs
- Weekly retraining improved accuracy by 22%
- Daily retraining increased accuracy by up to 20%
-
Core set tree structure benefits:
- Infinitely scalable and distributed
- Built once but usable multiple times
- Allows training on any subset of data
- Processing time stays consistent as data grows
-
Real-world case study results:
- Reduced hyperparameter tuning time from 154 hours to 5 hours
- Maintained or improved model accuracy vs full dataset training
- Enabled expansion from 144 to 864 hyperparameter combinations
- Achieved 11% average accuracy increase over 26 weeks
-
Implementation features:
- No data needs to leave local environment
- Works with existing ML libraries (XGBoost, LightGBM, etc.)
- Supports both classification and regression
- Automated data structure conversion