Itamar Turner-Trauring - Optimize first, parallelize second: a better path to faster data processing

Rust

Learn practical strategies for optimizing single-core performance before parallelization, with techniques for better algorithms, data structures, and resource efficiency.

Key takeaways

Optimize code performance on a single core before attempting parallelization - this provides better cost efficiency and resource utilization
Architecture choices and algorithmic improvements typically offer larger performance gains than parallelization alone
Parallelization has limitations:
- Doesn’t reduce total computation costs
- Some algorithms cannot be parallelized effectively
- Scaling costs increase linearly with cores
- Added complexity of distributed systems
Common performance issues include:
- Accidental quadratic algorithms
- Redundant calculations that could be cached
- Inefficient data structures and algorithms
- Not leveraging CPU architecture features
Tools and approaches for optimization:
- Use query planners and lazy evaluation (e.g., Polars vs Pandas)
- Leverage low-level optimized libraries (Numba, C/Rust)
- Profile code to identify bottlenecks
- Cache repeated calculations
- Filter data early in the pipeline
Faster single-core speeds are limited by hardware constraints - throwing money at faster CPUs has diminishing returns
Consider environmental and resource costs - more efficient code reduces energy consumption and CO2 emissions
Performance improvements are cumulative - architectural, algorithmic and low-level optimizations can multiply together for significant speedups
Development velocity improves with faster code - shorter feedback loops allow more iterations and experiments
Optimize based on specific use case - batch processing vs interactive applications require different approaches

Itamar Turner-Trauring - Optimize first, parallelize second: a better path to faster data processing

More talks