We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
From built-in concurrency primitives to large scale distributed computing — Jakub Urban
Learn about Python's concurrency tools, from built-in primitives to distributed computing frameworks like Dask and Ray, and discover best practices for scaling applications.
-
Python provides powerful built-in concurrency primitives through modules like
concurrent.futures
,threading
,multiprocessing
, andasyncio
-
The
concurrent.futures
module (introduced in Python 3.5) offers high-level abstractions for concurrent execution throughThreadPoolExecutor
andProcessPoolExecutor
-
Concurrency enables executing multiple tasks simultaneously, while parallelism specifically refers to executing tasks in parallel across multiple processing units
-
Key limitations to consider:
- Global Interpreter Lock (GIL) for threading
- Memory constraints for process-based parallelism
-
Serialization challenges with
pickle
-
For scaling beyond a single machine, frameworks like Dask and Ray provide:
- Distributed computing capabilities
- Data management across workers
- Resource management and scheduling
- Fault tolerance
- Integration with async/await
-
Best practices for concurrent/parallel processing:
- Profile code before optimization
- Process data in chunks when possible
- Consider resource limitations (CPU, memory)
- Use memory mapping for large datasets
- Choose appropriate executor based on workload type (I/O vs CPU-bound)
-
Common use cases for concurrency:
- Web servers
- API calls
- Data processing
- Machine learning workloads
- Grid search operations
-
Both Dask and Ray build upon similar concepts as
concurrent.futures
but add capabilities for:- Distributed execution
- Data serialization
- Cluster management
- Worker coordination