Kourlitis et al. - Scientific Python ecosystem helps answering fundamental questions of the Universe

Discover how Python's scientific libraries help CERN's ATLAS experiment process massive particle collision datasets to explore fundamental questions about our universe.

Key takeaways
  • The ATLAS experiment at CERN’s Large Hadron Collider (LHC) analyzes vast amounts of particle physics data to study fundamental questions about the universe

  • Key technical specs:

    • LHC is a 27km circular particle accelerator
    • Processes 40 million collisions per second
    • Generates exabytes of data
    • Only 1 in 10^13 collisions produces interesting events
  • CERN recently released 65 terabytes of open data from 2015-2016 in FISLIGHT format for research purposes

  • The scientific Python ecosystem is used for analysis:

    • Awkward Arrays library handles complex jagged data structures
    • COFFEA framework provides wrappers and tools on top of Awkward and Approot
    • Dask enables distributed computing and lazy evaluation
    • Matplotlib for visualization
    • NumPy for numerical operations
  • Data processing workflow:

    • Raw collision data is highly structured with varying numbers of particles per event
    • Data selection and feature engineering extract relevant physics properties
    • Distributed computing across multiple nodes handles large-scale analysis
    • Results often visualized through histograms
  • Primary physics goals include:

    • Studying Higgs boson properties
    • Investigating dark matter
    • Understanding matter/antimatter asymmetry
    • Searching for new particles and forces
  • Analysis infrastructure spans multiple data centers worldwide with specialized storage and computing capabilities