Ian Ozsvald - Making Pandas Fly

Optimize your Pandas workflow with these high-performance tips: categorize data, use nullable data types, optimize memory usage, and more explored in the speaker's book and online resources.

Key takeaways
  • Tip: Use categorical data type instead of strings, which are expensive in RAM and slow operations.
  • Use nullable data types like int64 and boolean for better performance and to reduce data size.
  • Installing bottleneck library can improve performance of certain operations.
  • Using Dask can improve performance by splitting data into smaller chunks and processing them in parallel.
  • Using Modin can extend the pandas idea in different ways.
  • Think about using float32 instead of float64 for numeric operations.
  • Consider installing numexpr for faster computations.
  • Use string instead of object for datetime columns.
  • The category type is not a magic string type, use it correctly.
  • Use nbytes to check the memory usage.
  • The speaker’s book, “High Performance Python” provides more information on these topics.
  • Attend the speaker’s blog for more updates and courses.
  • The speaker’s community, IPython, has many resources available.