Matt Harrison - An Introduction to Pandas 2, Polars, and DuckDB | PyData Global 2023

Explore the world of data processing with Pandas, Polars, and DuckDB, learn how to optimize code, improve performance, and reduce computational overhead with Pullers and PyArrow.

Key takeaways
  • The importance of having a query engine to optimize code and reduce execution time.
  • Pullers, a query engine with a data frame API, can be used to improve performance and reduce computational overhead.
  • A main difference between Pullers and Pandas is the ability to chain queries and optimize performance.
  • DuckDB, a query engine, can also be used to query against data stored in Pandas or Polaris.
  • The importance of leveraging PyArrow for faster data processing.
  • The potential drawbacks of using Pandas, including its high memory usage and slow performance.
  • The importance of considering the size of data and the type of operations being performed when choosing a data processing library.
  • The ability to use Pullers to port Pandas code to a faster and more efficient version.
  • The importance of understanding the data and the context in which it is being used when choosing a data processing library.
  • The potential benefits of using Pollers, including improved performance and reduced computational overhead.