Breaking PostgreSQL at Scale — Christophe Pettus

Break PostgreSQL at scale by learning expert strategies for indexing, partitioning, and optimizing large databases.

Key takeaways

When dealing with a large database (100+ GB), indexing can be a significant issue. Indexes can become very large and may not fit in memory.
Look for tables that can benefit from time-based partitioning, such as tables that store data with a specific timestamp range.
When creating indexes, don’t create them prospectively. Instead, create them based on real workload.
Base workmem on the actual temporary files being created in the logs.
Don’t set workmem to an arbitrary value. Instead, experiment with different values to find the optimal one.
When dealing with a large database, it’s essential to have a backup and recovery strategy in place.
Consider using pg_backrest or Barman for backup and recovery.
When dealing with a large database, it’s essential to have a load balancing strategy in place.
Consider using logical replication to replicate data between nodes.
When dealing with a large database, it’s essential to have a strategy for handling queries that return a large number of rows.
Consider using partitioning to reduce the size of the database and improve query performance.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of memory.
Consider using an all-in-memory database or a database with a large amount of RAM.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of disk space.
Consider using a database with a large disk capacity or a database that can be partitioned across multiple disks.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of CPU resources.
Consider using a database with a large CPU capacity or a database that can be parallelized across multiple nodes.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of I/O resources.
Consider using a database with a large I/O capacity or a database that can be optimized for I/O performance.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of memory and CPU resources.
Consider using a database with a large amount of RAM and CPU or a database that can be parallelized across multiple nodes.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of disk space and I/O resources.
Consider using a database with a large disk capacity and I/O capacity or a database that can be partitioned across multiple disks.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of CPU resources and memory.
Consider using a database with a large CPU capacity and RAM or a database that can be parallelized across multiple nodes.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of I/O resources and disk space.
Consider using a database with a large I/O capacity and disk capacity or a database that can be optimized for I/O performance.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of memory, CPU resources, and disk space.
Consider using a database with a large amount of RAM, CPU, and disk capacity or a database that can be parallelized across multiple nodes.
When dealing with a large database, it’s essential to have a strategy for handling queries that require a large amount of I/O resources, disk space, and CPU resources.
Consider using a database with a large I/O capacity, disk capacity, and CPU capacity or a database that can be optimized for I/O performance.

Breaking PostgreSQL at Scale — Christophe Pettus

More talks