CAPtivating architecture: Navigating Distributed Systems and Microservices by Alexandros Charos

Learn key patterns for building reliable distributed systems, from timeout configuration and transaction approaches to service boundaries and preventing cascading failures.

Key takeaways
  • Timeouts and retry policies are critical but often misconfigured - studies show 31% of errors come from missing timeouts and 47% from incorrect timeout values

  • When implementing distributed transactions, consider two main approaches:

    • Choreography: Services react to events independently
    • Orchestration: Central service coordinates the workflow
  • Key factors for choosing between choreography vs orchestration:

    • Choreography works well with existing event-driven architectures
    • Orchestration is better for complex flows and maintaining visibility
    • Orchestration can lead to tighter coupling between services
  • Implement idempotency to handle retries safely:

    • Request fingerprinting on server side
    • Client request IDs
    • Hash functions to detect duplicate requests
  • When calculating proper timeout values:

    • Measure 99th percentile response times
    • Consider both connection and read timeouts
    • Add buffer time for critical services
    • Include jitter in retry logic
  • Key considerations for microservice boundaries:

    • Code volatility (frequency of changes)
    • Fault tolerance requirements
    • Security boundaries
    • Scalability needs
    • Domain contexts
  • Circuit breaking and rate limiting are essential for preventing cascading failures

  • Document distributed workflows carefully to avoid creating unmaintainable “event-driven mud”

  • Total system availability decreases multiplicatively with each dependent service

  • Consider organizational readiness and operational capabilities before implementing distributed architectures