Safety Engineering: A Journey • Brad Tonkes • YOW! 2018

Learn how safety engineering prevents catastrophic failures through independent controls, strong culture, and systematic training. Key lessons and benefits explored.

Key takeaways
  • Safety engineering is distinct from reliability and resilience engineering - it focuses on managing unexpected situations and preventing catastrophic failures

  • Controls should be:

    • Independent from primary systems
    • Simple in design and implementation
    • Orthogonal (not duplicating primary functionality)
    • Minimal in number to avoid complexity
    • Validated through testing and monitoring
  • Successful safety programs require:

    • Clear top-down directives and management support
    • Whole organization involvement, not just dedicated teams
    • Systematic training on risks and controls
    • Regular review and reporting of KPIs
    • Cultural shift from “speed to market” to “safety first”
  • Common challenges in implementing safety controls:

    • Developer resistance and lack of motivation
    • Alert fatigue from false positives
    • Difficulty maintaining focus when controls are rarely used
    • Complexity in designing for unknown scenarios
    • Negative expected value (like insurance)
  • Key lessons learned:

    • Hubris is the enemy of safety
    • Put people and culture before technology
    • Controls need continuous refinement and patience
    • False positives need usability improvements
    • Controls require eternal vigilance even when not activated
  • Benefits beyond catastrophic prevention:

    • Improved incident detection and response
    • Better handling of everyday incidents
    • Enhanced reliability and resilience
    • More systematic approach to risk management
    • Stronger organizational accountability