Database Disasters and How to Find Them - Christophe Pettus - PGCon 2022

Discover the common symptoms and patterns of database disasters and learn how to identify and resolve performance issues, configuration changes, and unexpected behavior with Christophe Pettus at PGCon 2022.

Key takeaways
  • Identify the symptoms, not the cause
  • Don’t assume you know the cause of the problem
  • Use monitoring metrics to identify issues
  • IO latency is a key indicator of performance issues
  • Look for patterns in logs and metrics
  • Don’t blame, don’t worry about who did what
  • Focus on finding the root cause of the problem
  • Check for configuration changes and unexpected behavior
  • Don’t assume a single incident comes out of nowhere
  • Look for previous incidents and patterns
  • Identify the source of load and try to shed it
  • Check for connections, queries, and transactions
  • Use subtransactions to prevent deadlocks
  • Check for disk space consumption and NAS server issues
  • Use batch jobs to limit concurrency
  • Review the database configuration and settings
  • Don’t assume a problem is with the application
  • Check for unexpected network failures
  • Look for recent changes and deployments
  • Use automatic scripts to monitor and resolve issues
  • Keep notes and documentation
  • Use checklists to ensure issues are resolved correctly
  • Don’t underestimate the importance of communication
  • Use clear and concise language to communicate with stakeholders