We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Francesc Alted - Blosc2: Fast And Flexible Handling Of N-Dimensional and Sparse Datasets
Learn about Blosc2, a fast data compression library with advanced features like double partitioning, AI-powered parameter tuning, and support for massive N-dimensional datasets.
- Blosc2 is both a C and Python library for fast data compression, with a simple format specification under 300 lines
- Features double partitioning (chunks and blocks), enabling more selective and faster queries compared to single-partition formats
- Supports multi-dimensional arrays up to 63-bit containers and can handle datasets up to 8 trillion cells (~8TB)
-
Includes BTune for automatic compression parameter selection using:
- Genetic algorithms to test parameter combinations
- Deep learning models for real-time codec/filter selection
- Local training capabilities for custom datasets
- Offers dynamic plugin support for extending functionality with custom codecs and filters
- Provides integration with HDF5 through PyTables and H5Py wrappers
- Achieves 5-8x better speed when using second partition optimization
- Implements JPEG 2000 support through the Grok plugin for lossy compression
- Mimics NumPy API for ease of use and familiar syntax
- Supports multiple languages beyond Python, including C++, Rust, Julia, and R through the CBlosc underlying library