We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Missing Data, Bayesian Imputation and People Analytics with PyMC [PyCon DE & PyData Berlin 2024]
Learn how to handle missing data in people analytics using Bayesian imputation with PyMC. Explore statistical types of missing data, hierarchical modeling, and best practices.
-
Missing data in surveys can be classified into three main types:
- Missing Completely at Random (MCAR)
- Missing at Random (MAR)
- Missing Not at Random (MNAR)
-
Bayesian imputation methods are recommended for theory-informed missing data analysis because they:
- Allow flexible model specification
- Handle different types of distributions
- Provide built-in sensitivity analysis
- Enable workflow for model adequacy assessment
-
Hierarchical modeling is valuable for handling missing data because:
- It can account for team and management structures
- Helps isolate estimates of different impacts
- Can transform MNAR situations into MAR situations
- Allows for team-specific parameter estimates
-
In people analytics context:
- Decisions about careers need justifiable models
- Power relationships and hierarchies influence data collection
- Survey non-response patterns may reveal organizational inefficiencies
- Team-management mismatches can be identified
-
Technical implementation considerations:
- Variables should be ordered by degree of missingness
- Multiple distribution types can be handled in the same model
- Priors should be carefully selected based on domain knowledge
- Cross-validation and model adequacy checks are essential
-
Practical recommendations:
- Run pilot experiments to gather information for priors
- Consult subject matter experts for model construction
- Validate models and repeat as necessary
- Document your data-generating process assumptions