We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Guillaume Lemaitre - Get the best from your scikit-learn classifier | PyData Global 2023
Discover strategies to improve scikit-learn classifier performance, including tuning, calibration, and proper scoring rules, and learn how to optimize for business metrics to achieve reliable and effective model results.
- Resampling is not a good approach for class imbalance: It’s not a proper solution and can actually make things worse.
- Tuning the model is important: Grid search and hyperparameter tuning are crucial to optimize the model.
- Use proper scoring rules: Log loss, Brier score, and other proper scoring rules are more effective than accuracy, precision, and recall.
- Business metrics are important: Define a business metric that aligns with the problem you’re trying to solve and optimize for that.
- Calibration is key: Make sure the model is well-calibrated to avoid overfitting.
- Random forest can be improved: Balanced random forest can be a good approach to handle class imbalance.
- Resampling can be problematic: It can mess up the calibration of the model and lead to overfitting.
- Grid search can be useful: Use grid search to tune the hyperparameters of the model.
- Thresholding is important: Tune the threshold to optimize the model for the specific problem.
- Resampling is not necessary: If you’re using a well-calibrated model, resampling may not be necessary.
- Use metadata: Use metadata to define the business metric and optimize the model.
- Imbalanced classification is a problem: Imbalanced classification can lead to overfitting and poor performance.
- Proper calibration is important: Proper calibration is important to avoid overfitting and ensure the model is reliable.
- Business metrics are the goal: The goal is to optimize the business metric, not just the accuracy of the model.