Ashwin Srinath - cudf.pandas: The Zero Code Change GPU Accelerator for Pandas | PyData Global 2023

Discover cudf.pandas, a zero-code-change GPU accelerator for Pandas, achieving speedups of up to 10-100 times and supporting 60-75% of the Pandas API.

Key takeaways
  • cudf.pandas is a GPU accelerator for Pandas that allows you to run Pandas code on the GPU without modifying the code.
  • It uses a proxy library to translate Pandas API calls to GPU-specific calls.
  • The proxy library contains proxy functions and proxy types that mimic the behavior of Pandas.
  • cudf.pandas supports about 60-75% of the Pandas API, and can achieve speedups of up to 10-100 times on the GPU.
  • It can be used with third-party libraries that use the C API or subclass Pandas data frames.
  • cudf.pandas is designed to work with large data sets and can handle data sizes that exceed GPU memory.
  • It can be integrated with other tools such as Dask and Conda.
  • cudf.pandas is still a new project, but it has already achieved significant speedups in some benchmarks.
  • It supports about 94% of the Pandas test suite.
  • cudf.pandas is a good option for data scientists and developers who need to work with large data sets and want to take advantage of GPU acceleration.
  • cudf.pandas is not a replacement for Pandas, but rather a complementary tool that can be used alongside Pandas.
  • cudf.pandas is a GPU-based data frame library that provides a Pandas-like API.