"In the Land of the Sizing, the One-Partition Kafka Topic is King" by Ricardo Ferreira

Discover the secrets of high-performance Kafka applications and learn how to master the one-partition Kafka topic, a critical component of parallelism and storage in Kafka.

Key takeaways
  • Kafka partitions are the unit of parallelism and storage.
  • The default range assigner is sufficient for most use cases, but may not scale for complex scenarios.
  • Partitions are not inherently magical, and understanding their behavior is crucial for high-performance Kafka applications.
  • The unit of durability is critical for ensuring data consistency.
  • The formula for determining the number of partitions is max(# of producers, # of consumers) * replicas / CPU.
  • The broker’s ability to handle replication and deserialization can bottleneck throughput.
  • The CPU-intensive nature of event processing can lead to high CPU utilization.
  • Stack OverflowError can occur when the broker runs out of file handles.
  • Replication factor can impact scalability and performance.
  • Kafka partitions should be distributed evenly across brokers to maximize storage and CPU utilization.
  • Stopping a consumer does not automatically stop its assigned partitions.
  • Poison pills can occur when events are malformed or missing.
  • Kafka Streams can be used to create a single consumer group and manually distribute partitions.
  • Consistent storage strategy is necessary for high-performance Kafka applications.