Actionable Observability - Lesley Cordero - NDC London 2024

Learn actionable observability strategies to empower technologists, prioritize issue resolution, and optimize monitoring and alerting for proactive incident management and data-informed decision-making.

Key takeaways
  • Actionable observability is about empowering technologists to make high-impact work with the right data and skills, shifting from monitoring to debug application issues.
  • Observability is the ability to understand the internals of your software systems, providing insights into problem debugging and resolution.
  • Incident management is a process of understanding incident response, prioritizing issues, identifying the root cause, debugging and addressing problems.
  • Monitoring and alerting are essential components of observability, focusing on proactive prevention, incident detection, and automated responses to avoid toil.
  • Key considerations for monitoring and alerting: prioritization, automation, automation levels, notification strategies, and automation scope.
  • Service level metrics (SLOs) are used to measure SLI, including latency, error rates, request distribution, and throughput.
  • Monitoring should be data-informed, focused on application reliability, user experience, and system understanding, leveraging automation to manage complexity and noise.
  • Alerting strategies should consider: incident detection, notification thresholds, target detection, and alert escalation chains.
  • Automation should focus: on high-priority tasks, delegating low-priority tasks to human technicians, and streamlining workflows using monitoring and alerting tools.
  • Data analysis should consider: observability, monitoring, and alerting contexts to avoid data noise and provide actionable insights for improvement.
  • Product-contextual monitoring is crucial to adapt monitoring and alerting to product-related incidents, prioritizing product-user relationships.
  • Organizations should prioritize: observability, monitoring, and alerting, investing in automated monitoring, and empowering teams for data-driven decision-making.