We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Thomas J. Fan - Can There Be Too Much Parallelism? | SciPy 2023
Expert explores the limitations of parallelism in Python, highlighting inconsistencies in library APIs, challenges in using multiple libraries together, and discussing potential solutions for simplifying parallelism configuration.
- Can there be too much parallelism in Python? Yes, there are situations where parallelism can hinder performance, known as oversubscription.
- Different libraries have different APIs and defaults for configuring parallelism, leading to inconsistencies and difficulties in using multiple libraries together.
- OpenMP is commonly used for parallelism, but it can lead to issues when libraries use it differently.
- Some libraries, like NumPy and SciPy, use inner thread pools, while others, like scikit-learn, use outer thread pools.
- Establishing a common interface and default settings for parallelism would help to simplify the process of configuring parallelism.
- Numba and OpenBLAST are libraries that handle parallelism well, but they each have their own approach.
- The NumPy and SciPy community has already implemented a solution to avoid oversubscription.
- PyTorch has its own way of handling parallelism, similar to OpenBLAST.
- Some libraries, like Cycler and Polars, have their own way of configuring parallelism.
- There is no universal solution for parallelism, but setting environment variables and configuring inner and outer thread pools can help.
- Users may need to configure parallelism based on their specific use cases.
- Documentation and clear guidelines are essential for configuring parallelism.
- Using a consistent API and default settings for parallelism would make it easier for users to manage parallelism.