William Dealtry - Data persistence with consistency and performance in a truly serverless system

Learn how ArcticDB enables high-performance data persistence in serverless systems with features like ACID compliance, versioning, and Python integration for finance applications.

Key takeaways
  • ArcticDB is a Python-first data frame database designed for high-performance data persistence in serverless environments, focused on finance industry use cases

  • Key features include:

    • Schema-less storage with multi-dimensional data support
    • Versioning and time travel capabilities
    • Immutable data blocks to prevent corruption
    • Support for various storage backends (S3, Azure, local)
    • Lazy query evaluation with vectorized execution
  • Architecture highlights:

    • Uses structured keys and version keys for data organization
    • Implements persistent B-trees for efficient data access
    • Employs ECS (Entity Component System) architecture
    • Supports spill-to-storage for handling large datasets
    • Hybrid storage approach combining fast and slow storage tiers
  • Performance optimizations:

    • Parallel vectorized execution pipeline
    • Efficient compression and data transformation during read/write
    • Minimal metadata overhead
    • Smart chunking and block management
    • In-memory processing with storage spillover capability
  • Data consistency features:

    • ACID compliance for data operations
    • Atomic updates without data races
    • Immutable storage model prevents corruption
    • Version tracking and history preservation
    • Consistent schema enforcement per data frame
  • Practical benefits:

    • No server maintenance required
    • Python-native interface
    • Compatible with pandas, pyarrow, and polars
    • Easy scaling with cloud storage
    • Cost-effective storage management options