"KalDB: A cloud native log search platform" by Suman Karumuri (Strange Loop 2022)

Suman Karumuri, architect on Slack's observability team, presents KalDB, a cloud-native log search platform used to manage a petabyte of log data, highlighting its architecture, features, and scalability.

Key takeaways

KalDB manages a petabyte of log data with a 7-day retention period at Slack.
Slack’s use cases involve full-text search and older logs are indexed eventually.
Lucene is a feasible storage engine for log data.
The indexing process can be optimized by storing older logs in S3 and using tied storage.
The common fields in log messages can be extracted into key-value pairs.
Schema-less data allows for easier data management and query efficiency.
CalDB prioritizes indexing fresh logs over older logs.
Using cache nodes allows for faster query responses and better hardware utilization.
At scale, logs can be categorized into four types: high operational overhead, delayed logs, noisy neighbors, and field conflicts.
The cluster manager assigns tasks to recovery indexers and manages data life cycles.
Metadata stores are crucial for efficient data retrieval.
Using S3 as a deep store for logs reduces storage costs.
CalDB’s architecture allows for elastic scalability and Kubernetes native integration.
The system employs cache nodes that download segments from S3 and serve queries.
Queries typically revolve around last-day data, making it essential to have efficient query execution.
Duplicate information in logs and traces can be reduced by using aggregation support and ES-compatible APIs.
Fauna and CalDB can be used to overcome field conflicts.
Suman Karumuri is an architect on the observability team at Slack, building and running petabyte-scale systems.

"KalDB: A cloud native log search platform" by Suman Karumuri (Strange Loop 2022)

More talks