We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Francesc Alted - Btune: Making Compression Better | PyData Global 2023
Learn how BTune optimizes BLOSC2 compression using genetic algorithms and neural networks to balance speed and ratio based on hardware and data patterns.
-
BTune is a tool designed to find optimal compression parameters in BLOSC2, balancing compression speed, decompression speed and compression ratio
-
The tool has two main operating modes:
- Genetic algorithm for parameter optimization
- Neural network inference for quick parameter prediction
-
Key factors affecting compression performance:
- Hardware characteristics (CPU cores, cache size, memory speed)
- Data patterns and distribution
- Selected codec (LZ4, Zstd, BloscLZ etc.)
- Filters (shuffle, bitshuffle)
- Chunk sizes and splitting
- Compression levels (0-9)
-
The tradeoff parameter (0-1) controls optimization priorities:
- 0: Favor speed
- 1: Favor compression ratio
- 0.5: Balanced approach
-
Recommendations for optimal usage:
- Train models on hardware similar to production environment
- Don’t trust BTune blindly - validate results experimentally
- Consider chunk-by-chunk compression for better cache utilization
- Use tracing to understand BTune’s decision process
- LZ4 typically wins for speed, Zstd for compression ratio
-
Working with compressed data can sometimes be faster than uncompressed due to reduced memory bandwidth requirements when data fits in cache
-
BTune provides tools to assess optimal compression parameters rather than relying on guesswork, but results should be validated through testing
-
Performance is highly dependent on specific use case - hardware, data characteristics and optimization priorities must be considered together