We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Exploring Zarr: From Fundamentals to Version 3.0 and Beyond [PyCon DE & PyData Berlin 2024]
Learn about Zarr, a format for large array storage: its fundamentals, key features, and improvements in v3. Explore cloud-native capabilities, compression, and migration paths.
- 
    Zarr is a format for storing large arrays divided into compressed chunks, popular in genomics, geospatial, bioimaging and scientific domains 
- 
    Key features: - Cloud-native storage support
- Hierarchical organization of arrays into groups
- Compressed chunked storage
- Support for massive datasets (petabyte scale)
- Language-agnostic specification
 
- 
    Zarr v3 improvements over v2: - Consolidated metadata into single JSON document
- More language-agnostic specification
- Better cloud storage optimization
- Extension mechanism for adding features
- Variable chunk sizes support
- Sharding codec to reduce latency
 
- 
    Implementation ecosystem: - Python reference implementation
- Implementations in C++, Rust, Julia, JavaScript, Java
- Growing community and adoption
- Regular community meetings and governance process
 
- 
    Key concepts: - Arrays divided into equal-sized chunks
- Each chunk is independently compressed
- Metadata stored in zarr.json files
- Dictionary-like key-value storage model
- Support for hierarchical organization
 
- 
    Extension mechanism in v3: - Allows adding features without changing core spec
- Example: Sharding codec groups multiple chunks
- Community-driven proposal process (ZEPs)
- Maintains backwards compatibility
 
- 
    Migration path: - v2 datasets can still be used
- Tools being developed for v2 to v3 conversion
- No requirement to immediately convert existing data