We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
"The Evolution of a Planetary-scale Distributed Database" by Kevin Scaldeferri (Strange Loop 2022)
Learn how a planetary-scale distributed database evolved over 9 years, leveraging Kubernetes, S3, Kafka, and cellular architecture to simplify complex query solutions and ensure scalability.
- Switched to Kubernetes deployment 9 years ago due to detail on one slide
- Edge data pushed into storage layer, treating S3 as primary storage
- Designed a system that reuses successful architectural patterns
- Cells are like massively multi-tenant, scale independently, distribute workloads, carry their own failover and failback mechanisms
- Stateful systems still hard, but significant progress made
- Problem of one sort or another in a particular cell? Allow us to retire a cell and replace it
- We want customers to be able to query all of their data, not just a subset of it
- Wanted to treat data very differently, but knew how, wanted to use S3, wanted to go back to something like stock
- Our Kafka installation is getting bigger and bigger, need to decouple ingest and query
- Solution: use Kafka, use S3, build metadata store, build query router, build live query workers
- Made a decision to embark on a pair of couple changes, but to simplify some of these solutions
- Cellular architecture great approach for trying to cap size of any tier
- Realized we could reuse some of these solutions to S3 gallery
- Still have data that’s coming in through the same end points that customers have always used
- Our system has evolved significantly over time, lessons learned along the way
- Want to make sure we don’t put two giant customers into the same cell
- Made a decision to move away from decentralizing storage and move towards centralized storage
- Building next generation of metadata system to help with data distribution