Ankur Ankan - Introduction to Causal Inference using pgmpy | PyData Amsterdam 2024

Learn about causal inference using PGMPY: Discover DAG & potential outcomes frameworks, causal discovery algorithms, evaluation metrics & real-world applications in PyData talk

Key takeaways
  • There are two main frameworks for causal inference: potential outcomes framework and directed acyclic graphs (DAGs)

  • Causal discovery is challenging because multiple causal graphs can represent the same observed data, making it difficult to determine the true causal relationships

  • The DAG framework requires significant manual intervention and expert knowledge to build accurate models, especially for identifying confounders and colliders

  • PGMPY provides tools for:

    • Causal discovery algorithms (PC, Hill Climb)
    • Export knowledge integration
    • Parameter estimation
    • Testing implied conditional independencies
    • Simulation capabilities
  • Common evaluation metrics include:

    • Fisher’s C-test
    • Correlation score
    • Structure score
    • F1 score-based metrics
  • When choosing between potential outcomes vs DAG framework:

    • Use potential outcomes for estimating single causal effects
    • Use DAGs for broader causal discovery and understanding mechanisms
    • Consider combining both approaches when possible
  • Key challenges in causal inference:

    • Lack of ground truth data
    • Difficulty in handling reverse causality
    • Sensitivity to algorithm parameters
    • Need for large datasets
    • Problems with highly correlated variables
  • The field is actively evolving with new methods and approaches being developed regularly

  • Applications span multiple domains including epidemiology, economics, social sciences, and machine learning

  • Integration of expert knowledge and automated methods (like LLMs) can help improve model accuracy