Kafka for .NET Developers - Ian Cooper - NDC Oslo 2024

Learn the fundamentals of Apache Kafka for .NET developers, covering core concepts, reliability patterns, the Confluent client SDK, and essential tools in the ecosystem.

Key takeaways
  • Kafka is a distributed, append-only log system originally created at LinkedIn for data lake ingestion, now widely used for messaging and event streaming

  • Messages in Kafka are organized into topics with partitions, where:

    • Each partition has a leader and followers for redundancy
    • Messages are immutable once written
    • Ordering is only guaranteed within a single partition
  • Key concepts for producers:

    • Messages consist of a key and value
    • Producer writes are asynchronous by default
    • Need to call flush() to ensure messages are sent
    • Can control delivery guarantees with acks setting (leader-only vs all replicas)
  • Consumer patterns:

    • Consumers operate in consumer groups to scale processing
    • Each partition can only be read by one consumer in a group
    • Consumers track their position using offsets
    • Single-threaded to preserve ordering
  • Schema management:

    • Schema Registry provides centralized schema storage
    • Supports Avro, Protobuf and JSON Schema formats
    • Handles schema evolution and compatibility
    • First 5 bytes of message contain schema metadata
  • Reliability considerations:

    • Manual vs auto commit of offsets
    • Delivery reports for producer acknowledgements
    • Idempotent producers to prevent duplicates
    • Outbox pattern for reliable integration
  • .NET specific details:

    • Confluent .NET client is the main SDK
    • Async/await support throughout
    • SerDes handle serialization/deserialization
    • Message pump pattern for consuming
  • Ecosystem includes many tools:

    • Kafka Connect for integrations
    • KSQL for stream processing
    • UI tools for management
    • ZooKeeper being replaced by KRaft