Breaking PostgreSQL at Scale — Christophe Pettus

Break PostgreSQL at scale by learning expert strategies for indexing, partitioning, and optimizing large databases.

Key takeaways
  • When dealing with a large database (100+ GB), indexing can be a significant issue. Indexes can become very large and may not fit in memory.
  • Look for tables that can benefit from time-based partitioning, such as tables that store data with a specific timestamp range.
  • When creating indexes, don’t create them prospectively. Instead, create them based on real workload.
  • Base workmem on the actual temporary files being created in the logs.
  • Don’t set workmem to an arbitrary value. Instead, experiment with different values to find the optimal one.
  • When dealing with a large database, it’s essential to have a backup and recovery strategy in place.
  • Consider using pg_backrest or Barman for backup and recovery.
  • When dealing with a large database, it’s essential to have a load balancing strategy in place.
  • Consider using logical replication to replicate data between nodes.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that return a large number of rows.
  • Consider using partitioning to reduce the size of the database and improve query performance.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of memory.
  • Consider using an all-in-memory database or a database with a large amount of RAM.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of disk space.
  • Consider using a database with a large disk capacity or a database that can be partitioned across multiple disks.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of CPU resources.
  • Consider using a database with a large CPU capacity or a database that can be parallelized across multiple nodes.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of I/O resources.
  • Consider using a database with a large I/O capacity or a database that can be optimized for I/O performance.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of memory and CPU resources.
  • Consider using a database with a large amount of RAM and CPU or a database that can be parallelized across multiple nodes.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of disk space and I/O resources.
  • Consider using a database with a large disk capacity and I/O capacity or a database that can be partitioned across multiple disks.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of CPU resources and memory.
  • Consider using a database with a large CPU capacity and RAM or a database that can be parallelized across multiple nodes.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of I/O resources and disk space.
  • Consider using a database with a large I/O capacity and disk capacity or a database that can be optimized for I/O performance.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of memory, CPU resources, and disk space.
  • Consider using a database with a large amount of RAM, CPU, and disk capacity or a database that can be parallelized across multiple nodes.
  • When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of I/O resources, disk space, and CPU resources.
  • Consider using a database with a large I/O capacity, disk capacity, and CPU capacity or a database that can be optimized for I/O performance.