We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Itamar Turner-Trauring - Optimize first, parallelize second: a better path to faster data processing
Learn practical strategies for optimizing single-core performance before parallelization, with techniques for better algorithms, data structures, and resource efficiency.
- 
    Optimize code performance on a single core before attempting parallelization - this provides better cost efficiency and resource utilization 
- 
    Architecture choices and algorithmic improvements typically offer larger performance gains than parallelization alone 
- 
    Parallelization has limitations: - Doesn’t reduce total computation costs
- Some algorithms cannot be parallelized effectively
- Scaling costs increase linearly with cores
- Added complexity of distributed systems
 
- 
    Common performance issues include: - Accidental quadratic algorithms
- Redundant calculations that could be cached
- Inefficient data structures and algorithms
- Not leveraging CPU architecture features
 
- 
    Tools and approaches for optimization: - Use query planners and lazy evaluation (e.g., Polars vs Pandas)
- Leverage low-level optimized libraries (Numba, C/Rust)
- Profile code to identify bottlenecks
- Cache repeated calculations
- Filter data early in the pipeline
 
- 
    Faster single-core speeds are limited by hardware constraints - throwing money at faster CPUs has diminishing returns 
- 
    Consider environmental and resource costs - more efficient code reduces energy consumption and CO2 emissions 
- 
    Performance improvements are cumulative - architectural, algorithmic and low-level optimizations can multiply together for significant speedups 
- 
    Development velocity improves with faster code - shorter feedback loops allow more iterations and experiments 
- 
    Optimize based on specific use case - batch processing vs interactive applications require different approaches