We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Cultivating Production Excellence • Liz Fong-Jones • YOW! 2019
Cultivate production excellence beyond building systems.
- It’s not just about building the system, but also about making it work and reliable for users.
- Redundancy and failover are essential, but don’t assume everything will always work as expected.
- Obscurity and noise in alerting systems can be detrimental, and it’s important to have a clear understanding of what constitutes an emergency.
- The concept of service level objectives (SLOs) should be revisited, and we need to measure availability and reliability in a more meaningful way.
- We should not prioritize over-documenting, but focus on understanding critical user journeys and what affects their experience.
- Complexity should be addressed through ergonomic instrumentation paths, efficient data storage, and collaboration.
- Teams need to communicate effectively and have shared views on data to make decisions.
- Engineers should be empowered to make decisions and ask questions, and should be valued and rewarded for their contributions.
- Culture and processes play a significant role in making systems reliable and friendly.
- Measuring what’s important, such as user satisfaction and experience, is crucial.
- It’s important to iterate on our approach, testing and refining our methods, rather than sticking to a single framework or tool.
- We should prioritize building up the skills and abilities of individuals and teams, rather than relying on tools alone.
- The concept of a “blameless postmortem” is important, as it encourages learning from failures and improves our understanding of the system.
- Collaboration and communication are key to addressing outages and incidents, and we should focus on empowering individuals to make decisions.
- The frequency and severity of outages can have a significant impact on users, and we should prioritize reducing the impact of these incidents.
- Observability is essential, and we should invest in tools and culture to enable this.
- We should not be afraid to ask questions or challenge assumptions, and should prioritize the well-being and satisfaction of our users.
- It’s important to recognize that the human element is essential in making systems reliable, and we should prioritize building up the skills and abilities of individuals and teams.