Marco Gorelli - Understanding Polars Expressions when you're used to pandas | PyData Amsterdam 2024

Learn how Polars expressions differ from pandas, with practical examples and best practices. Discover key advantages like memory efficiency and multi-threading in this PyData talk.

Key takeaways
  • Polars expressions are fundamentally functions that transform data frames to series, not producing values until given input

  • Key advantages of Polars over Pandas include:

    • Better memory efficiency
    • Built-in lazy and eager execution modes
    • Native multi-threading capabilities
    • GPU backend support
    • Automatic query optimization
  • Expressions in Polars provide:

    • A more intuitive syntax compared to Pandas lambdas
    • Support for multi-column operations
    • Automatic type preservation
    • Built-in optimizations for common operations
    • Natural integration with group by operations
  • Best practices for transitioning from Pandas:

    • Avoid using Pandas apply() function
    • Use Polars for new projects
    • Keep existing Pandas code if it works well
    • Leverage Polars’ native expressions instead of recreating Pandas patterns
  • Core limitations of Polars expressions:

    • Series must be of homogeneous data type
    • Column names must be unique strings
    • Series must be same length within a data frame
    • Single series output in standard operations
  • Polars integrates well with common ML libraries:

    • Direct support in scikit-learn
    • Automatic conversion to required formats
    • Efficient memory handling for large datasets
  • Group by operations in Polars:

    • More efficient than Pandas equivalents
    • Auto-exploding of results
    • Support for complex aggregations
    • Preservation of data types