Marco Gorelli - Understanding Polars Expressions when you're used to pandas | PyData Amsterdam 2024

Python

Learn how Polars expressions differ from pandas, with practical examples and best practices. Discover key advantages like memory efficiency and multi-threading in this PyData talk.

Key takeaways

Polars expressions are fundamentally functions that transform data frames to series, not producing values until given input
Key advantages of Polars over Pandas include:
- Better memory efficiency
- Built-in lazy and eager execution modes
- Native multi-threading capabilities
- GPU backend support
- Automatic query optimization
Expressions in Polars provide:
- A more intuitive syntax compared to Pandas lambdas
- Support for multi-column operations
- Automatic type preservation
- Built-in optimizations for common operations
- Natural integration with group by operations
Best practices for transitioning from Pandas:
- Avoid using Pandas apply() function
- Use Polars for new projects
- Keep existing Pandas code if it works well
- Leverage Polars’ native expressions instead of recreating Pandas patterns
Core limitations of Polars expressions:
- Series must be of homogeneous data type
- Column names must be unique strings
- Series must be same length within a data frame
- Single series output in standard operations
Polars integrates well with common ML libraries:
- Direct support in scikit-learn
- Automatic conversion to required formats
- Efficient memory handling for large datasets
Group by operations in Polars:
- More efficient than Pandas equivalents
- Auto-exploding of results
- Support for complex aggregations
- Preservation of data types

Marco Gorelli - Understanding Polars Expressions when you're used to pandas | PyData Amsterdam 2024

More talks