"Disaster Recovery Options running Apache Kafka in Kubernetes" by Geetha Anne (Strange Loop 2022)

Discover the best practices for running Apache Kafka in Kubernetes, including disaster recovery options, replication strategies, and storage requirements, to ensure high availability and resiliency in your event-driven architecture.

Key takeaways
  • Apache Kafka on Kubernetes requires multiple availability zones and regions for resiliency.
  • Replicator and Mirror Maker are used for asynchronous replication between clusters.
  • Observers are introduced to enable asynchronous message transfer and automatic promotion to followers.
  • Kafka’s resiliency model includes features like leader election, ISR, and automatic client failover.
  • Monitoring is crucial for handling errors and management of clusters.
  • Event Sizer.io can be used to strategize placement of pods.
  • Pod affinity rules can be defined to co-locate or isolate resources.
  • Storage class definition is required for Kafka and ZooKeeper.
  • Replicator and Mirror Maker support multi-regional, multi-cloud data replication.
  • ZooKeeper configuration requires defining dependencies and log size and data volume configurations.
  • Automatic observer promotion is a must for resiliency.
  • Topic replica placement policy is critical for resiliency.
  • RPO and RTO must be considered when designing Kafka workloads.
  • Kafka’s distributed system includes a messaging queue and a file system.
  • Kubernetes has functionality like config maps, secrets, and operators for custom controllers.
  • Kafka fits well with Kubernetes due to its scalability and reliability features.
  • Replicator and Mirror Maker support metrics aggregation using JMX and Jelakya.
  • Cluster linking is built on top of Replicator and preserves offsets and failover.
  • Storage requirements for Kafka include high-performance disks like SSDs.
  • Kafka’s resiliency model ensures minimal downtime and RTO of zero in the event of disaster.