Talks - Bradley Dice: Hacking `import` for speed: how we wrote a GPU accelerator for pandas

Learn how NVIDIA's cuDF accelerates pandas workflows with GPU computing for 10-100x speedups, using import hacks for zero-code-change migration & automatic CPU fallback.

Key takeaways
  • cuDF Pandas is an NVIDIA-developed GPU accelerator that enables zero-code-change acceleration for pandas workflows, offering 10-100x speedups

  • Works best for medium to large datasets (5-20GB range), with GPU memory being the main limiting factor

  • Provides 60-75% coverage of pandas API, automatically falling back to CPU for unsupported operations

  • Integrates with broader RAPIDS ecosystem, including Dask for scaling beyond single GPU memory limits

  • Available through conda and pip packages, recently added to Google Colab with GPU runtime support

  • Uses custom import machinery to intercept pandas imports and proxy them to GPU-accelerated implementations

  • Includes built-in profiling tools to track which operations run on GPU vs CPU

  • Maintains compatibility with the broader Python data science ecosystem (matplotlib, scikit-learn, etc.)

  • Passes over 90% of pandas test suite, ensuring reliable drop-in replacement functionality

  • Particularly effective for operations like joins, groupby aggregations, and filtering operations

  • Handles memory movement between CPU/GPU automatically and transparently to users

  • Primary limitations include GPU memory constraints, incomplete API coverage, and some performance overhead from CPU/GPU transfers when falling back