José Neves: A journey into postgresql logical replication (PGConf.EU 2023)

Learn how to overcome common Postgres logical replication pitfalls with José Neves.

Key takeaways
  • Imbalanced data replication can lead to inconsistencies in an application’s reports. To solve this issue, the authors of this presentation chose to use Postgres logical replication.
  • Logical replication uses LSN (Log Sequence Number) to track the offset of a transaction. LSN offsets are important for ordering the events in the replication stream.
  • The authors initially thought that LSN offsets would be incremental across transactions, but they soon realized that this was not the case. LSN offsets can be non-consecutive due to the way Postgres logs events.
  • The authors had to change their approach to logical replication to ensure consistent data replication. They chose to commit only on transaction end offsets.
  • The authors also found that using individual operations (e.g., inserts, updates) instead of transactions led to data duplication. They recommends using transactions as a whole.
  • The authors used a custom CDC (Change Data Capture) pipeline to handle data replication. They built the pipeline in-house because they wanted control over the entire process.
  • The authors also used a messaging service to send data-changing events from the replication stream to an event messaging service.
  • The authors found that using Debezium for CDC would not have allowed them to customize the pipeline as much as they needed.
  • One of the key takeaways from this presentation is the importance of correctly understanding how LSN offsets work in Postgres logical replication.
  • The authors also highlighted the need to consider concurrency issues when working with Postgres logical replication.
  • Another key takeaway is the importance of committing only on transaction end offsets to ensure consistent data replication.
  • The authors’ custom CDC pipeline was built to handle large amounts of data and to ensure data consistency.
  • The authors also used a custom event messaging service to handle data-changing events.
  • The presentation highlights the importance of correctly understanding how Postgres logical replication works and how to use it correctly.
  • The authors’ custom solution for CDC and event messaging was designed to handle large amounts of data and to ensure data consistency.