"The Evolution of a Planetary-scale Distributed Database" by Kevin Scaldeferri (Strange Loop 2022)

Learn how a planetary-scale distributed database evolved over 9 years, leveraging Kubernetes, S3, Kafka, and cellular architecture to simplify complex query solutions and ensure scalability.

Key takeaways
  • Switched to Kubernetes deployment 9 years ago due to detail on one slide
  • Edge data pushed into storage layer, treating S3 as primary storage
  • Designed a system that reuses successful architectural patterns
  • Cells are like massively multi-tenant, scale independently, distribute workloads, carry their own failover and failback mechanisms
  • Stateful systems still hard, but significant progress made
  • Problem of one sort or another in a particular cell? Allow us to retire a cell and replace it
  • We want customers to be able to query all of their data, not just a subset of it
  • Wanted to treat data very differently, but knew how, wanted to use S3, wanted to go back to something like stock
  • Our Kafka installation is getting bigger and bigger, need to decouple ingest and query
  • Solution: use Kafka, use S3, build metadata store, build query router, build live query workers
  • Made a decision to embark on a pair of couple changes, but to simplify some of these solutions
  • Cellular architecture great approach for trying to cap size of any tier
  • Realized we could reuse some of these solutions to S3 gallery
  • Still have data that’s coming in through the same end points that customers have always used
  • Our system has evolved significantly over time, lessons learned along the way
  • Want to make sure we don’t put two giant customers into the same cell
  • Made a decision to move away from decentralizing storage and move towards centralized storage
  • Building next generation of metadata system to help with data distribution