We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Alexander Sosna: How we execute PG major upgrades at GitLab, with zero downtime. (PGConf.EU 2023)
GitLab's approach to executing PostgreSQL major upgrades with zero downtime, using a two-cluster strategy, incremental upgrades, and custom solutions for sequences and logical replication.
- To perform a PostgreSQL major upgrade with zero downtime, GitLab uses a two-cluster approach, with a target cluster upgraded to the new version and a standby cluster following behind.
- To ensure minimal impact on users, the upgrade is performed during off-peak hours, and the target cluster is upgraded incrementally, with Physiological replication used to synchronize data.
- The upgrade process involves creating a new target cluster, streaming data from the source cluster to the new target cluster, and then switching the production load to the new target cluster.
- To handle sequences, which are not replicated by PostgreSQL, GitLab uses a custom solution involving a sequence number generator and a logical replication slot.
- The upgrade process is automated using Chef, a configuration management tool, which ensures that machines are provisioned and configured correctly.
- The team also uses Rsync to transfer data between clusters, and rzinc to synchronize the two clusters.
- Sequences are critical for PostgreSQL, and their replication is not supported. GitLab uses a custom solution to replicate sequences.
- Logical replication is complex and requires careful testing and validation.
- Schema changes are also handled automatically using Chef.
- The upgrade process involves several steps, including creating a new target cluster, streaming data, and switching the production load. Each step is carefully tested and validated.
- The team uses a heavy testing approach, including regression testing and QA testing, to ensure that the application works correctly after the upgrade.
- They also use a benchmarking environment to test the upgrade and ensure that it meets performance requirements.
- The upgrade process is designed to have zero user impact, with all data replicated and available on the new cluster before switching the production load.
- The team uses a YOLO (You Only Live Once) approach to testing and validation, ensuring that the upgrade is thoroughly tested before deployment.
- The upgrade process is monitored and optimized continuously, with improvements made based on feedback from users and performance metrics.