We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
"KalDB: A cloud native log search platform" by Suman Karumuri (Strange Loop 2022)
Suman Karumuri, architect on Slack's observability team, presents KalDB, a cloud-native log search platform used to manage a petabyte of log data, highlighting its architecture, features, and scalability.
- KalDB manages a petabyte of log data with a 7-day retention period at Slack.
- Slack’s use cases involve full-text search and older logs are indexed eventually.
- Lucene is a feasible storage engine for log data.
- The indexing process can be optimized by storing older logs in S3 and using tied storage.
- The common fields in log messages can be extracted into key-value pairs.
- Schema-less data allows for easier data management and query efficiency.
- CalDB prioritizes indexing fresh logs over older logs.
- Using cache nodes allows for faster query responses and better hardware utilization.
- At scale, logs can be categorized into four types: high operational overhead, delayed logs, noisy neighbors, and field conflicts.
- The cluster manager assigns tasks to recovery indexers and manages data life cycles.
- Metadata stores are crucial for efficient data retrieval.
- Using S3 as a deep store for logs reduces storage costs.
- CalDB’s architecture allows for elastic scalability and Kubernetes native integration.
- The system employs cache nodes that download segments from S3 and serve queries.
- Queries typically revolve around last-day data, making it essential to have efficient query execution.
- Duplicate information in logs and traces can be reduced by using aggregation support and ES-compatible APIs.
- Fauna and CalDB can be used to overcome field conflicts.
- Suman Karumuri is an architect on the observability team at Slack, building and running petabyte-scale systems.