Talks - Bradley Dice: Hacking `import` for speed: how we wrote a GPU accelerator for pandas

Python

Learn how NVIDIA's cuDF accelerates pandas workflows with GPU computing for 10-100x speedups, using import hacks for zero-code-change migration & automatic CPU fallback.

Key takeaways

cuDF Pandas is an NVIDIA-developed GPU accelerator that enables zero-code-change acceleration for pandas workflows, offering 10-100x speedups
Works best for medium to large datasets (5-20GB range), with GPU memory being the main limiting factor
Provides 60-75% coverage of pandas API, automatically falling back to CPU for unsupported operations
Integrates with broader RAPIDS ecosystem, including Dask for scaling beyond single GPU memory limits
Available through conda and pip packages, recently added to Google Colab with GPU runtime support
Uses custom import machinery to intercept pandas imports and proxy them to GPU-accelerated implementations
Includes built-in profiling tools to track which operations run on GPU vs CPU
Maintains compatibility with the broader Python data science ecosystem (matplotlib, scikit-learn, etc.)
Passes over 90% of pandas test suite, ensuring reliable drop-in replacement functionality
Particularly effective for operations like joins, groupby aggregations, and filtering operations
Handles memory movement between CPU/GPU automatically and transparently to users
Primary limitations include GPU memory constraints, incomplete API coverage, and some performance overhead from CPU/GPU transfers when falling back

Talks - Bradley Dice: Hacking `import` for speed: how we wrote a GPU accelerator for pandas

More talks