We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Darren Vengroff - An Introduction to Impact Charts | SciPy 2024
Learn how impact charts visualize feature effects using Shapley values, combining ML predictions with causal modeling to reveal hidden patterns and assess feature importance.
-
Impact charts are a visualization technique that shows how different features affect a target variable by plotting distributions of Shapley values
-
The method combines three key areas: machine learning predictions, interpretable ML techniques, and social science causal modeling approaches
-
Key advantages include being nonparametric (no assumptions about linear/nonlinear relationships) and robust to irrelevant variables
-
Implementation typically uses:
- Ensemble of 50-100 independent ML models
- Training each model on different data samples
- XGBoost as underlying model
- Plotting distributions of Shapley values vertically
-
Useful for:
- Finding hidden patterns in data
- Understanding feature importance
- Detecting bias in ML systems
- Analyzing categorical and continuous variables
- Aggregate data analysis without privacy concerns
-
Applied successfully to eviction data analysis:
- Revealed racial disparities in eviction rates
- Showed nonlinear relationships with income
- Identified stronger effects than traditional regression methods
-
Error bars and distribution widths help assess confidence in detected effects
-
Available as open-source package:
- Simple implementation (3-5 lines of code)
- Supports custom data analysis
- Includes hyperparameter optimization options
-
Particularly valuable for social science applications where causal relationships are important but traditional regression may miss complex patterns