Mostly Harmless Fixed Effects Regression in Python with PyFixest [PyCon DE & PyData Berlin 2024]

Learn about PyFixest, a high-performance Python library for fixed effects regression analysis, featuring parallel C++ algorithms and compatibility with R's Fixest package.

Key takeaways
  • PyFixest is a Python port of the R Fixest library for regression analysis, providing high-performance fixed effects regression functionality

  • The library offers significant performance improvements - up to 5-10x faster than alternatives, being on par with Julia implementation and close to R’s Fixest performance

  • Key features include:

    • OLS regression with high-dimensional fixed effects
    • Poisson regression
    • Instrumental variable models
    • Advanced inference techniques (cluster robust, wild cluster bootstrap)
    • Multiple estimation capabilities
    • Post-processing options for visualization and result comparison
  • Uses Wilkinson formulas for intuitive model specification syntax similar to R

  • Core algorithms are implemented in C++ and parallelized for performance optimization

  • Automatically handles practical considerations like:

    • Dropping multicollinear variables
    • Computing robust standard errors
    • Providing sensible defaults
  • Built with compatibility in mind - results match exactly with R’s Fixest implementation

  • Designed to be user-friendly for R Fixest users - maintains similar API and behavior

  • Can handle large datasets efficiently (tested with 10M+ observations)

  • Current limitations include:

    • No GLS/WLS support for Poisson regression
    • Performance may degrade with very high-dimensional categorical features
    • Some IV diagnostics still missing
  • Active development continues with focus on expanding features and improving code quality