We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Ian Ozsvald - Making Pandas Fly
Optimize your Pandas workflow with these high-performance tips: categorize data, use nullable data types, optimize memory usage, and more explored in the speaker's book and online resources.
- Tip: Use categorical data type instead of strings, which are expensive in RAM and slow operations.
-
Use nullable data types like
int64
andboolean
for better performance and to reduce data size. -
Installing
bottleneck
library can improve performance of certain operations. -
Using
Dask
can improve performance by splitting data into smaller chunks and processing them in parallel. -
Using
Modin
can extend the pandas idea in different ways. -
Think about using
float32
instead offloat64
for numeric operations. -
Consider installing
numexpr
for faster computations. -
Use
string
instead ofobject
for datetime columns. -
The
category
type is not a magic string type, use it correctly. -
Use
nbytes
to check the memory usage. - The speaker’s book, “High Performance Python” provides more information on these topics.
- Attend the speaker’s blog for more updates and courses.
- The speaker’s community, IPython, has many resources available.