Talks - Michael Droettboom: Measuring the performance of CPython

Django Python Testing

Learn how Microsoft's CPython team measures Python performance using PyPerformance benchmarks, statistical techniques, and continuous testing to drive optimizations.

Key takeaways

The CPython Performance Engineering team at Microsoft uses PyPerformance Suite containing over 100 benchmarks to measure Python performance
Benchmarks are categorized into three main types:
- Application benchmarks (full applications like Django CMS)
- Toy benchmarks (simple <100 line programs)
- Microbenchmarks (testing specific language features)
Key challenges in benchmarking include:
- System noise from OS/other processes
- CPU thermal management and speed variations
- Memory layout randomization
- Virtual machines adding additional noise
- Benchmark warmup time
Performance improvements typically come from many small 1% optimizations stacked together rather than major breakthroughs
The team runs benchmarks on bare metal hardware to reduce noise, with typical noise levels around ±1% when properly controlled
Statistical techniques used:
- Running benchmarks multiple times
- Hierarchical Performance Testing (HPT)
- Distribution analysis
- Geometric mean for aggregating results
Most benchmarks spend time in different areas:
- 54 benchmarks primarily in the interpreter
- Others split between library code, memory management, kernel
- Important to understand where each benchmark spends time
Continuous benchmarking helps evaluate changes:
- Tests changes against main branch
- Takes ~1.5-2.5 hours per run
- Security considerations limit public access
Future needs include:
- More real-world application benchmarks
- Better parallel/threading benchmarks
- Reduced benchmark runtime while maintaining value

Talks - Michael Droettboom: Measuring the performance of CPython

More talks