Devoxx Greece 2024 - Kubernetes Resiliency by Chris Ayers

Testing

Learn how to design resilient and observable Kubernetes applications by setting baselines for resource requests, leveraging availability zones, and implementing monitoring, observability, and backup strategies.

Key takeaways

Creating a Baseline for Kubernetes

Set a baseline for resource requests and limits in Kubernetes applications
Requests should be based on average usage, not minimum

Resiliency and Availability

Availability zones are critical for resiliency in Kubernetes
Use availability zones to spread workloads across multiple regions
Haikus can be used to monitor availability

Monitoring and Observability

Use metrics like CPU usage, memory usage, and queue lengths to monitor applications
Leverage tools like Open Telemetry and distributed tracing for observability

Node and Resource Management

Use node pools and resource requests to manage compute resources
Limitations are crucial for resource management
Use feature flags to manage rollout of new features and versions

Scaling and Autoscaling

Use horizontal pod autoscalers to scale applications based on demand
Leverage pod disruption budgets to handle scaling and autoscaling

Failure Domains and Rollback

Identify failure domains in Kubernetes applications
Use probes like liveness, readiness, and startup probes to detect failures
Roll back deployments when failures occur

Backup and Disaster Recovery

Plan for backup and disaster recovery in Kubernetes applications
Use tools like Chaos Mesh for testing and validation

Monitoring and Testing

Monitor applications and nodes in Kubernetes
Load test applications to ensure they can handle demand
Validate test results using metrics and logging