Vincent D. Warmerdam - Run a benchmark they said. It will be fun they said. | PyData Amsterdam 2024

Python Testing

Learn essential benchmarking tips for data science: from test set sizing to hyperparameter optimization. Discover how to avoid common pitfalls & implement best practices.

Key takeaways

When running benchmarks, focus on solving simpler, specific problems rather than attempting large complex benchmarks that can become distracting
Use random search instead of grid search for hyperparameter optimization - it’s more efficient and allows better control over compute resources
Leverage caching mechanisms at multiple levels (estimator level, generator level) to avoid recomputing values and save significant computation time
Be mindful of data quality issues like:
- Bad labels and bias in human annotations
- Different datetime formats
- Missing values
- Time series order preservation
- Categorical feature handling
Model comparisons need careful consideration of:
- Test set size impacts on statistical power
- Cross-validation strategy effects
- Hardware resource usage (memory, compute)
- Default parameter sensitivity
- Impact of feature preprocessing steps
Use parallelization tools like joblib with generators to efficiently distribute workloads across available compute resources
Visualizations like parallel coordinates plots can reveal insights about hyperparameter importance and model behavior
Don’t focus solely on metric improvements - consider practical tradeoffs like training time and memory usage
The perceived improvement in benchmark scores can be an illusion due to factors like:
- Random seed optimization
- Test set size manipulation
- Hyperparameter overfitting
Features and data quality often matter more than model architecture choice - most models perform similarly with proper preprocessing

Vincent D. Warmerdam - Run a benchmark they said. It will be fun they said. | PyData Amsterdam 2024

More talks