Continuous Verification: Beyond Chaos Engineering • Cat Swetel • YOW! 2020

Learn how continuous verification expands on chaos engineering by constantly monitoring system safety boundaries and drift patterns in production to catch potential failures early

Key takeaways
  • Continuous verification goes beyond chaos engineering by constantly testing system boundaries and safety margins in production environments, rather than just testing if specific scenarios are safe

  • Complex systems drift towards failure gradually through many small changes rather than single catastrophic events - this “normalization of deviance” makes it critical to proactively monitor safety boundaries

  • The safety margin in systems is not static but constantly shifting due to economic pressures, workload changes, and system evolution - continuous verification helps track these changes over time

  • Automated tooling is essential for continuous verification since humans cannot consistently monitor and detect subtle shifts in system boundaries manually

  • Production testing is necessary because safety boundaries manifest differently in production vs staging environments due to real-world interactions and complexities

  • Organizations need to balance delivering new features with maintaining system health (“reproductive labor”) - continuous verification helps maintain this balance

  • The practice requires both observability tools and active system perturbation with closed feedback loops to understand boundaries

  • Success requires moving from a mindset of avoiding failure to treating failure as inevitable and building systems that can recover

  • Leading indicators of approaching safety boundaries include increasing frequency of incidents and development of informal “workarounds”

  • Cultural challenges include lack of common vocabulary around the practice and tendency to normalize deviations rather than addressing root causes