Itamar Turner-Trauring - Optimize first, parallelize second: a better path to faster data processing

Learn practical strategies for optimizing single-core performance before parallelization, with techniques for better algorithms, data structures, and resource efficiency.

Key takeaways
  • Optimize code performance on a single core before attempting parallelization - this provides better cost efficiency and resource utilization

  • Architecture choices and algorithmic improvements typically offer larger performance gains than parallelization alone

  • Parallelization has limitations:

    • Doesn’t reduce total computation costs
    • Some algorithms cannot be parallelized effectively
    • Scaling costs increase linearly with cores
    • Added complexity of distributed systems
  • Common performance issues include:

    • Accidental quadratic algorithms
    • Redundant calculations that could be cached
    • Inefficient data structures and algorithms
    • Not leveraging CPU architecture features
  • Tools and approaches for optimization:

    • Use query planners and lazy evaluation (e.g., Polars vs Pandas)
    • Leverage low-level optimized libraries (Numba, C/Rust)
    • Profile code to identify bottlenecks
    • Cache repeated calculations
    • Filter data early in the pipeline
  • Faster single-core speeds are limited by hardware constraints - throwing money at faster CPUs has diminishing returns

  • Consider environmental and resource costs - more efficient code reduces energy consumption and CO2 emissions

  • Performance improvements are cumulative - architectural, algorithmic and low-level optimizations can multiply together for significant speedups

  • Development velocity improves with faster code - shorter feedback loops allow more iterations and experiments

  • Optimize based on specific use case - batch processing vs interactive applications require different approaches