Francesc Alted - Btune: Making Compression Better | PyData Global 2023

Learn how BTune optimizes BLOSC2 compression using genetic algorithms and neural networks to balance speed and ratio based on hardware and data patterns.

Key takeaways
  • BTune is a tool designed to find optimal compression parameters in BLOSC2, balancing compression speed, decompression speed and compression ratio

  • The tool has two main operating modes:

    • Genetic algorithm for parameter optimization
    • Neural network inference for quick parameter prediction
  • Key factors affecting compression performance:

    • Hardware characteristics (CPU cores, cache size, memory speed)
    • Data patterns and distribution
    • Selected codec (LZ4, Zstd, BloscLZ etc.)
    • Filters (shuffle, bitshuffle)
    • Chunk sizes and splitting
    • Compression levels (0-9)
  • The tradeoff parameter (0-1) controls optimization priorities:

    • 0: Favor speed
    • 1: Favor compression ratio
    • 0.5: Balanced approach
  • Recommendations for optimal usage:

    • Train models on hardware similar to production environment
    • Don’t trust BTune blindly - validate results experimentally
    • Consider chunk-by-chunk compression for better cache utilization
    • Use tracing to understand BTune’s decision process
    • LZ4 typically wins for speed, Zstd for compression ratio
  • Working with compressed data can sometimes be faster than uncompressed due to reduced memory bandwidth requirements when data fits in cache

  • BTune provides tools to assess optimal compression parameters rather than relying on guesswork, but results should be validated through testing

  • Performance is highly dependent on specific use case - hardware, data characteristics and optimization priorities must be considered together