We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Talks - Michael Droettboom: Measuring the performance of CPython
Learn how Microsoft's CPython team measures Python performance using PyPerformance benchmarks, statistical techniques, and continuous testing to drive optimizations.
-
The CPython Performance Engineering team at Microsoft uses PyPerformance Suite containing over 100 benchmarks to measure Python performance
-
Benchmarks are categorized into three main types:
- Application benchmarks (full applications like Django CMS)
- Toy benchmarks (simple <100 line programs)
- Microbenchmarks (testing specific language features)
-
Key challenges in benchmarking include:
- System noise from OS/other processes
- CPU thermal management and speed variations
- Memory layout randomization
- Virtual machines adding additional noise
- Benchmark warmup time
-
Performance improvements typically come from many small 1% optimizations stacked together rather than major breakthroughs
-
The team runs benchmarks on bare metal hardware to reduce noise, with typical noise levels around ±1% when properly controlled
-
Statistical techniques used:
- Running benchmarks multiple times
- Hierarchical Performance Testing (HPT)
- Distribution analysis
- Geometric mean for aggregating results
-
Most benchmarks spend time in different areas:
- 54 benchmarks primarily in the interpreter
- Others split between library code, memory management, kernel
- Important to understand where each benchmark spends time
-
Continuous benchmarking helps evaluate changes:
- Tests changes against main branch
- Takes ~1.5-2.5 hours per run
- Security considerations limit public access
-
Future needs include:
- More real-world application benchmarks
- Better parallel/threading benchmarks
- Reduced benchmark runtime while maintaining value