We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Marco Gorelli - Understanding Polars Expressions when you're used to pandas | PyData Amsterdam 2024
Learn how Polars expressions differ from pandas, with practical examples and best practices. Discover key advantages like memory efficiency and multi-threading in this PyData talk.
-
Polars expressions are fundamentally functions that transform data frames to series, not producing values until given input
-
Key advantages of Polars over Pandas include:
- Better memory efficiency
- Built-in lazy and eager execution modes
- Native multi-threading capabilities
- GPU backend support
- Automatic query optimization
-
Expressions in Polars provide:
- A more intuitive syntax compared to Pandas lambdas
- Support for multi-column operations
- Automatic type preservation
- Built-in optimizations for common operations
- Natural integration with group by operations
-
Best practices for transitioning from Pandas:
-
Avoid using Pandas
apply()
function - Use Polars for new projects
- Keep existing Pandas code if it works well
- Leverage Polars’ native expressions instead of recreating Pandas patterns
-
Avoid using Pandas
-
Core limitations of Polars expressions:
- Series must be of homogeneous data type
- Column names must be unique strings
- Series must be same length within a data frame
- Single series output in standard operations
-
Polars integrates well with common ML libraries:
- Direct support in scikit-learn
- Automatic conversion to required formats
- Efficient memory handling for large datasets
-
Group by operations in Polars:
- More efficient than Pandas equivalents
- Auto-exploding of results
- Support for complex aggregations
- Preservation of data types