Joshua Taillon - HyperSpy – Your Multidimensional Data Analysis Toolbox | SciPy 2024

Learn about HyperSpy, an open-source Python library for multidimensional data analysis, featuring visualization tools, signal processing, and machine learning capabilities.

Key takeaways
  • HyperSpy is an open-source Python library designed for interactive analysis of multidimensional data arrays, with particular strength in materials characterization and electron microscopy

  • The library provides robust tools for:

    • Visualization of multi-dimensional datasets
    • Signal processing and analysis
    • Curve and model fitting
    • Basic machine learning capabilities
    • Lazy processing with Dask integration
    • Metadata preservation and tracking
  • Core concepts include:

    • Division between navigation and signal axes
    • NumPy-like syntax for data manipulation
    • Extensible architecture through subclassing
    • Integration with the scientific Python ecosystem
  • Notable features:

    • Interactive visualization in Jupyter environments
    • Built-in support for PCA, NMF, and clustering
    • Customizable model fitting capabilities
    • Comprehensive metadata handling
    • Support for large datasets through Dask
  • Community aspects:

    • Over 60 contributors across 14 years
    • Regular tutorials and training sessions
    • Focus on documentation and code quality
    • Community-driven development model
    • Domain-agnostic core with specialized extensions
  • Recent developments:

    • 2.0 release focused on modularization
    • Separation of domain-specific code into extensions
    • Enhanced support for different data formats
    • Improved scalability for large datasets
    • Growing adoption beyond electron microscopy