Matthew Rocklin - Dask in Production | SciPy 2024

Python

Matthew Rocklin shares practical insights on optimizing Dask for production: from reducing cloud costs with ARM instances to avoiding common infrastructure pitfalls & deployment tips.

Key takeaways

Cloud computing costs can be significantly reduced by:
- Using ARM instances instead of Intel (5% faster, cheaper)
- Leveraging spot instances when available
- Turning off resources when not actively in use
- Running workloads close to where data is stored
Running Dask in production revealed:
- The Global Interpreter Lock (GIL) is usually not a bottleneck (only ~25% contention)
- Most workloads can process 1TB of data in ~5 hours for ~10 cents
- Scaling is underutilized because people think it’s more expensive than it is
- Raw cloud architecture (basic EC2 + networking) often works better than complex Kubernetes setups
Common cloud infrastructure challenges:
- Docker wasn’t designed for rapid development cycles
- Serverless functions (Lambda) are 4x more expensive than regular instances
- Users often leave large VMs running 24/7 unnecessarily
- Moving data between regions/services is extremely costly
Success factors for cloud deployments:
- Making cloud environments match local development environments
- Collecting detailed metrics on usage patterns
- Supporting hardware flexibility across regions/instance types
- Enabling rapid environment synchronization
The scientific Python ecosystem is increasingly ARM-compatible:
- 90-95% of workloads can run on ARM
- Only specific cases (like MKL-dependent code) require Intel
- Community should move towards ARM as the default

Matthew Rocklin - Dask in Production | SciPy 2024

More talks