Kyle Cranmer - Keynote: Particles, People, and Pull Requests | SciPy 2024

Kyle Cranmer explores how Python and open source transformed particle physics research - from tracking collisions to collaborative analysis and machine learning at LHC.

Key takeaways
  • The Large Hadron Collider (LHC) processes an enormous amount of data - around 40 quadrillion collisions, with detectors containing ~100 million sensors recording at 40 million times per second

  • Traditional HEP (High Energy Physics) software was siloed, C++-based, and not well integrated with modern data science tools and workflows. Moving to Python-based tools and array-oriented programming has enabled better collaboration and innovation

  • Simulation-based inference has emerged as a key technique, allowing researchers to compare observed data against simulated theories without requiring tractable likelihood functions

  • Open data and analysis preservation are important but challenging goals - there’s a balance between making data accessible while maintaining incentives for investment in experiments and software

  • Development of tools like PyHF, Awkward Array, and Scikit-HEP has created a more modern ecosystem for HEP analysis that integrates better with the broader scientific Python community

  • Collaborative statistical modeling frameworks have enabled better sharing and reproduction of analyses between teams and experiments

  • Moving from centralized, monolithic software to modular, distributed approaches has made it easier for newcomers to contribute while preserving expertise

  • The field is transitioning from focusing solely on reproducibility to enabling analysis preservation and reuse, which provides more scientific value

  • Integration of machine learning techniques like active learning has improved the efficiency of analyzing the massive datasets

  • There’s growing recognition that software development and data science roles are essential to modern physics research, requiring dedicated resources and professional recognition