Olivier Grisel - Predictive survival analysis with scikit-learn, scikit-survival and lifelines

Python

Learn how to perform predictive survival analysis in Python using scikit-learn, scikit-survival & lifelines. Master key concepts, models & applications in this tutorial.

Key takeaways

Survival analysis deals with right-censored time-to-event data, where some observations don’t experience the event during the study period
The Kaplan-Meier estimator provides an unbiased estimate of survival probabilities even with censored data, serving as a baseline non-conditional model
Two key metrics for evaluating survival models:
- Integrated Brier Score (IBS) - measures calibration and discrimination
- Concordance Index - measures ranking/discriminative ability only
Cox Proportional Hazards is a popular predictive model for survival analysis, but has limitations like not allowing survival curves to cross
More flexible models available:
- Gradient Boosting Incidents
- Survival Forests
- Can capture non-linear interactions between features
Key Python libraries for survival analysis:
- lifelines: Core survival analysis functionality
- scikit-survival: Extension of scikit-learn for survival
- hazardous: Experimental library with newer models
Naive approaches like discarding censored data or imputing with large values introduce significant bias
Survival analysis has applications in:
- Medical research (patient survival)
- Predictive maintenance
- Customer churn
- Insurance claim modeling
The hazard rate represents the instantaneous risk of event occurrence, conditional on survival up to that point
Feature preprocessing like splines and polynomial features can help capture non-linear relationships in survival models

Olivier Grisel - Predictive survival analysis with scikit-learn, scikit-survival and lifelines

More talks