We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Olivier Grisel - Predictive survival analysis with scikit-learn, scikit-survival and lifelines
Learn how to perform predictive survival analysis in Python using scikit-learn, scikit-survival & lifelines. Master key concepts, models & applications in this tutorial.
-
Survival analysis deals with right-censored time-to-event data, where some observations don’t experience the event during the study period
-
The Kaplan-Meier estimator provides an unbiased estimate of survival probabilities even with censored data, serving as a baseline non-conditional model
-
Two key metrics for evaluating survival models:
- Integrated Brier Score (IBS) - measures calibration and discrimination
- Concordance Index - measures ranking/discriminative ability only
-
Cox Proportional Hazards is a popular predictive model for survival analysis, but has limitations like not allowing survival curves to cross
-
More flexible models available:
- Gradient Boosting Incidents
- Survival Forests
- Can capture non-linear interactions between features
-
Key Python libraries for survival analysis:
- lifelines: Core survival analysis functionality
- scikit-survival: Extension of scikit-learn for survival
- hazardous: Experimental library with newer models
-
Naive approaches like discarding censored data or imputing with large values introduce significant bias
-
Survival analysis has applications in:
- Medical research (patient survival)
- Predictive maintenance
- Customer churn
- Insurance claim modeling
-
The hazard rate represents the instantaneous risk of event occurrence, conditional on survival up to that point
-
Feature preprocessing like splines and polynomial features can help capture non-linear relationships in survival models