Dan Schult - Sparse arrays in scipy.sparse | SciPy 2024

Learn about SciPy's transition from sparse matrices to arrays, including key differences, migration strategies, and storage formats. Essential for developers using scipy.sparse.

Key takeaways
  • SciPy is transitioning from sparse matrices to sparse arrays to better align with NumPy’s array API and modern practices

  • Key differences between sparse matrices and arrays:

    • Matrices are always 2D while arrays can be 1D or 2D
    • Arrays use @ operator for matrix multiplication (matrices use *)
    • Element-wise operations behave differently
    • Indexing returns different dimensional results
  • Migration path is gradual:

    • New array classes exist alongside matrix classes
    • Construction functions have new array-based versions
    • Migration guide and developer support coming soon
    • Full deprecation planned after at least two release cycles
  • Current status:

    • 1D and 2D sparse arrays are fully functional
    • Indexing support coming in version 1.15
    • ND (>2D) support planned for future
  • Storage formats:

    • CSR (Compressed Sparse Row)
    • CSC (Compressed Sparse Column)
    • COO (Coordinate format)
    • DIA (Diagonal format)
    • Different formats optimize for different sparsity patterns
  • Migration recommendations:

    • Update existing matrix code first
    • Convert constructors from matrix to array versions
    • Change multiplication operators (* to @)
    • Update helper functions and shape handling
    • Ensure good test coverage
  • Backward compatibility maintained:

    • Old matrix code continues to work
    • Side-by-side implementation reduces migration friction
    • Matrix features being maintained during transition