Darren Vengroff - An Introduction to Impact Charts | SciPy 2024

Learn how impact charts visualize feature effects using Shapley values, combining ML predictions with causal modeling to reveal hidden patterns and assess feature importance.

Key takeaways
  • Impact charts are a visualization technique that shows how different features affect a target variable by plotting distributions of Shapley values

  • The method combines three key areas: machine learning predictions, interpretable ML techniques, and social science causal modeling approaches

  • Key advantages include being nonparametric (no assumptions about linear/nonlinear relationships) and robust to irrelevant variables

  • Implementation typically uses:

    • Ensemble of 50-100 independent ML models
    • Training each model on different data samples
    • XGBoost as underlying model
    • Plotting distributions of Shapley values vertically
  • Useful for:

    • Finding hidden patterns in data
    • Understanding feature importance
    • Detecting bias in ML systems
    • Analyzing categorical and continuous variables
    • Aggregate data analysis without privacy concerns
  • Applied successfully to eviction data analysis:

    • Revealed racial disparities in eviction rates
    • Showed nonlinear relationships with income
    • Identified stronger effects than traditional regression methods
  • Error bars and distribution widths help assess confidence in detected effects

  • Available as open-source package:

    • Simple implementation (3-5 lines of code)
    • Supports custom data analysis
    • Includes hyperparameter optimization options
  • Particularly valuable for social science applications where causal relationships are important but traditional regression may miss complex patterns