Thomas J. Fan - Can There Be Too Much Parallelism? | SciPy 2023

Expert explores the limitations of parallelism in Python, highlighting inconsistencies in library APIs, challenges in using multiple libraries together, and discussing potential solutions for simplifying parallelism configuration.

Key takeaways
  • Can there be too much parallelism in Python? Yes, there are situations where parallelism can hinder performance, known as oversubscription.
  • Different libraries have different APIs and defaults for configuring parallelism, leading to inconsistencies and difficulties in using multiple libraries together.
  • OpenMP is commonly used for parallelism, but it can lead to issues when libraries use it differently.
  • Some libraries, like NumPy and SciPy, use inner thread pools, while others, like scikit-learn, use outer thread pools.
  • Establishing a common interface and default settings for parallelism would help to simplify the process of configuring parallelism.
  • Numba and OpenBLAST are libraries that handle parallelism well, but they each have their own approach.
  • The NumPy and SciPy community has already implemented a solution to avoid oversubscription.
  • PyTorch has its own way of handling parallelism, similar to OpenBLAST.
  • Some libraries, like Cycler and Polars, have their own way of configuring parallelism.
  • There is no universal solution for parallelism, but setting environment variables and configuring inner and outer thread pools can help.
  • Users may need to configure parallelism based on their specific use cases.
  • Documentation and clear guidelines are essential for configuring parallelism.
  • Using a consistent API and default settings for parallelism would make it easier for users to manage parallelism.