DjangoCon Europe 2023 | Do the holes in Swiss cheese leak abstractions?

Discover how a Swiss cheese model accident led to a catastrophic outage at Kraken, highlighting the pitfalls of leaky abstractions and the importance of convention-based coding, code review, and dry run functionality.

Key takeaways
  • The concept of “leaky abstractions” was introduced by Joel Spolsky, and it refers to the idea that abstractions in software are inherently leaky and can lead to unexpected failures or errors.
  • The Swiss cheese model of accident causation was used to explain how a combination of failures can lead to a catastrophic outcome.
  • The speaker’s company, Kraken, experienced an outage due to a concatenation of factors, including a long-running cron job, a failed database migration, and a poorly designed migration command.
  • The outage was caused by a leaky abstraction in the code, which failed to properly manage database locks and transactions.
  • The speaker highlighted the importance of convention-based coding and code review to prevent similar issues in the future.
  • Abstractions in software should be designed with potential failure modes in mind, and developers should test and review code to ensure that it works correctly in all scenarios.
  • The speaker also discussed the concept of “dry run” functionality, which allows developers to test and simulate migrations without committing them to the production database.
  • Another important concept is the use of “TCP parameters” on RDS, which allows developers to tune the behavior of TCP connections and prevent issues with connection timeouts and resource leaks.
  • Finally, the speaker emphasized the importance of learning from failures and using them as an opportunity to improve code and prevent future outages.