A Kafka Producer’s Request: Or, There and Back Again by Danica Fine

Follow a Kafka producer's request journey from serialization through batching, partitioning, and replication. Learn key metrics and configurations for optimal performance.

Key takeaways
  • To get data into Kafka, events must first be serialized into bytes since brokers only understand binary data

  • Batching configurations are critical for performance:

    • batch.size controls maximum batch size (default 16KB)
    • linger.ms controls how long to wait to fill batches (default 0ms)
    • buffer.memory must be larger than batch size to handle multiple batches
    • Compression can be enabled to reduce network usage
  • Partitioning strategies affect data distribution:

    • Default strategy uses message key hash
    • Sticky partitioning aims for uniform distribution
    • Round robin distributes across all partitions
    • Custom partitioners can be implemented
  • Key monitoring metrics:

    • Record queue time average
    • Request rate and latency
    • Batch size average vs configured batch size
    • Number of in-flight requests
    • Request handler thread utilization
  • Data replication workflow:

    • Data is first written to leader partition
    • Followers fetch updates from leader
    • Producer acknowledgment depends on acks setting
    • Replication factor determines number of copies
  • Important producer configurations:

    • max.in.flight.requests.per.connection limits concurrent requests
    • retries controls retry behavior (infinite by default)
    • request.timeout.ms sets timeout for broker responses
    • enable.idempotence prevents duplicate messages
    • Transaction support requires transactional.id
  • Brokers store data in segments consisting of:

    • Log files for event data
    • Index files for offset mapping
    • Time-based indices for timestamp-based access
  • Producer requests go through multiple stages:

    • Network threads handle connections
    • IO threads process requests
    • Request queue buffers incoming requests
    • Response queue holds outgoing responses