Opening Keynote Incident Management in the Age of DevOps by Damon Edwards

In this talk, Damon Edwards explores the evolving landscape of incident management in the age of DevOps, highlighting importance of observability, automation, and self-regulation amidst complexity and failure.

Key takeaways
  • Incident management is evolving in the age of DevOps, with a focus on observability, automation, and self-regulation.
  • The old idea of escalation and firefighting is being replaced by more proactive approaches, such as autonomous runbooks and empowered teams.
  • Cloud-native technologies and microservices architectures require a rethink of incident management, with a focus on distributed, ephemeral systems.
  • The concept of an OODA loop (observe, orient, decide, act) is relevant to incident management, as teams need to rapidly respond to and learn from failures.
  • Automation is key to improving incident management, particularly in areas such as logging, monitoring, and diagnostics.
  • Runbooks are becoming increasingly important, as they provide self-service access to expert knowledge and empower teams to take action.
  • SRE (site reliability engineering) is a key concept in incident management, as it emphasizes the importance of reliable systems and automated governance.
  • The idea of “swarming” is becoming more popular, as teams seek to rapidly respond to and resolve incidents using autonomous, human-centered approaches.
  • Complexity and failure are inevitable, but understanding and learning from failures is essential to improving incident management and driving business value.
  • The importance of operations and the need to rethink how we manage and respond to incidents are key themes in the talk.
  • Automation, observability, and self-regulation will be increasingly important as we move forward in the age of DevOps.