Hajime Takeda - Introduction to Causal Inference with Machine Learning | SciPy 2024

Learn how causal machine learning reveals true cause-effect relationships in your data. Discover meta-learners, uplift modeling, and practical applications across industries.

Key takeaways
  • Causal inference with machine learning aims to understand causality and determine cause-effect relationships, unlike traditional ML which focuses on correlation-based predictions

  • Randomized Control Trials (RCT) are the gold standard for causal inference but aren’t always feasible due to ethical, time, or cost constraints

  • Two main techniques for causal machine learning:

    • Meta-learners (S-learner, T-learner, X-learner, DR-learner, DML)
    • Uplift modeling (decision tree-based approach)
  • Meta-learners use machine learning models to:

    • Predict outcomes for treated and untreated groups
    • Measure treatment effects at individual and group levels
    • Handle complex, high-dimensional data
  • Key treatment effect metrics:

    • Average Treatment Effect (ATE) - overall effect across population
    • Conditional Average Treatment Effect (CATE) - effect within specific segments
    • Individual Treatment Effect (ITE) - effect at individual level
  • Model validation approaches:

    • Accuracy metrics (RMSE, cross-validation)
    • Refutation methods (random common cause, placebo treatment)
    • Sensitivity analysis
  • Two major libraries for causal ML:

    • EconML (Microsoft) - broader range of algorithms
    • CausalML (Uber) - focused on marketing and uplift modeling
  • Common applications include:

    • Marketing (coupon effectiveness)
    • Healthcare (treatment outcomes)
    • Economics (job training programs)
    • Social interventions
  • Confounding variables must be identified and controlled to avoid biased results (Simpson’s Paradox)

  • Recommended approach: start with simpler methods (S-learner/T-learner) before moving to more complex approaches