Sean Law - STUMPY: Modern Time Series Analysis with Matrix Profiles | SciPy 2024

Learn how STUMPY, a Python library for time series analysis, uses matrix profiles to efficiently find patterns and anomalies in large datasets without prior training.

Key takeaways
  • Matrix profiles provide a powerful way to analyze time series data by finding similar patterns and anomalies without requiring prior knowledge or training data

  • STUMPY is a Python library that implements matrix profiles with high performance, supporting multi-CPU/GPU processing, streaming data, and distributed computing via Dask

  • Core capabilities include:

    • Finding exact nearest neighbors and motifs in time series data
    • Detecting anomalies and conserved behaviors
    • Supporting multidimensional time series analysis
    • Providing pan-matrix profiles for variable-length pattern matching
  • Key advantages:

    • User-friendly API requiring minimal parameters
    • Highly interpretable results based on Euclidean distance
    • Scalable to large datasets (50M+ points)
    • No need for data preprocessing like detrending
    • 100% test coverage and battle-tested in production
  • Technical details:

    • Uses sliding window Euclidean distance calculations
    • Leverages FFT and computation reuse for efficiency
    • Supports z-normalization of subsequences
    • Minimal dependencies (NumPy, SciPy, Numba)
    • Recent 15-20% performance improvements
  • Common use cases:

    • Pattern discovery and motif detection
    • Anomaly detection
    • Time series joins and comparisons
    • Clustering with matrix profile distance
    • Exploratory data analysis
  • Active open source project with:

    • 9 million+ downloads
    • 3,000+ GitHub stars
    • Regular releases and updates
    • Support for latest Python/NumPy versions
    • Extensive documentation and tutorials