Vincent D. Warmerdam - Run a benchmark they said. It will be fun they said. | PyData Amsterdam 2024

Learn essential benchmarking tips for data science: from test set sizing to hyperparameter optimization. Discover how to avoid common pitfalls & implement best practices.

Key takeaways
  • When running benchmarks, focus on solving simpler, specific problems rather than attempting large complex benchmarks that can become distracting

  • Use random search instead of grid search for hyperparameter optimization - it’s more efficient and allows better control over compute resources

  • Leverage caching mechanisms at multiple levels (estimator level, generator level) to avoid recomputing values and save significant computation time

  • Be mindful of data quality issues like:

    • Bad labels and bias in human annotations
    • Different datetime formats
    • Missing values
    • Time series order preservation
    • Categorical feature handling
  • Model comparisons need careful consideration of:

    • Test set size impacts on statistical power
    • Cross-validation strategy effects
    • Hardware resource usage (memory, compute)
    • Default parameter sensitivity
    • Impact of feature preprocessing steps
  • Use parallelization tools like joblib with generators to efficiently distribute workloads across available compute resources

  • Visualizations like parallel coordinates plots can reveal insights about hyperparameter importance and model behavior

  • Don’t focus solely on metric improvements - consider practical tradeoffs like training time and memory usage

  • The perceived improvement in benchmark scores can be an illusion due to factors like:

    • Random seed optimization
    • Test set size manipulation
    • Hyperparameter overfitting
  • Features and data quality often matter more than model architecture choice - most models perform similarly with proper preprocessing