Talks - Pablo Galindo Salgado: Profiling at the speed of light

Learn how Python 3.12 & 3.13 enhance performance profiling with perf support. Discover tools for analyzing Python & C code, spotting bottlenecks, and optimizing speed.

Key takeaways
  • Python 3.12 added perf support with frame pointers, allowing profiling of both Python and C code simultaneously
  • Python 3.13 introduces enhanced perf JIT support that doesn’t require recompiling Python with frame pointers
  • Perf can provide detailed low-level insights like branch mispredictions, cache misses, and hardware counter data
  • Two main approaches for perf profiling in Python:
    • Using frame pointers (faster but requires recompilation)
    • Using debug information/DWARF (slower but more flexible)
  • Frame pointers have ~2-3% performance overhead while debug information can have up to 7% overhead
  • Perf enables visualization tools like flame graphs to analyze Python and C code performance
  • The tool allows tracing specific Python functions and C extensions simultaneously
  • Perf can help identify issues like:
    • Guild contention in multi-threaded applications
    • Garbage collector overhead
    • Branch misprediction problems
    • Memory access patterns
  • The ecosystem includes GUI tools like Hotspot for visualizing perf data
  • Can be used both for application profiling and low-level performance analysis