Monday, 7 November 2022

Logical Replication Improvements in PostgreSQL-15

There are various areas in PostgreSQL like Partitioning, Logical Replication, Parallel Query, Vacuum, etc. which improve with each new version. In this blog, I'll summarize the various enhancements in Logical Replication that users could see in the recently released PostgreSQL 15. You can read the enhancements in this area in the previous release in one of my previous blogs.

Allow replication of prepared transactions:

In the last release, we allowed logical decoding of prepared transactions and with this release, we added the support to replicate prepared transactions to built-in logical replication. Previously, we send the changes of the prepared transaction only once the commit prepared had been done. Users can enable replication at PREPARE time with the following syntax:

CREATE PUBLICATION  mypub FOR ALL TABLES;

CREATE SUBSCRIPTION mysub CONNECTION 'dbname=postgres' PUBLICATION  mypub WITH (two_phase = true);

The key advantages of this feature are:

(a) Reduces the lag to replicate data by replicating it at PREPARE time instead of waiting till the COMMIT PREPARED

(b) This provides the base to build conflict-free logical replication because if the prepare fails on subscriber nodes then one can rollback it on the publisher node as well.

The key implementation points:

(a) The replication of prepared transactions is enabled once the initial sync for all the tables is finished.

(b) To avoid conflicts in the prepared transaction during APPLY, we use the prepare identifier as pg_gid_<subscriber-id>_<transaction-id>.

(c) It is not allowed to change this option with ALTER SUBSCRIPTION command.

(d) ALTER SUBSCRIPTION REFRESH PUBLICATION is allowed with copy_data=false once the two_phase is enabled for a subscription.

For a detailed description of this feature, see the blog.

Allow replication of all tables in the schema:

Previously, one needs to specify all the tables of a particular schema while creating a publication if they want to publish all tables of that schema. Then, if later, the user creates more tables in that schema, they also need to be added to the publication separately. This would be inconvenient for users. This feature makes it much easier for users by allowing them to specify just the schema name in case they want all tables of the schema to be published. The syntax to specify schema name (TABLES IN SCHEMA) is as follows:

CREATE PUBLICATION mypub FOR TABLES IN SCHEMA mysch;

CREATE PUBLICATION mypub FOR TABLE mytab, TABLES IN SCHEMA mysch;

Note that it is allowed to specify schemas with individual tables from other schemas. Users can add the schemas to existing publications with the following syntax:

ALTER PUBLICATION mypub ADD TABLES IN SCHEMA mysch;

Note that adding schemas to a publication that is already subscribed to by some subscribers will require an ALTER SUBSCRIPTION … REFRESH PUBLICATION action on the subscriber side in order to become effective.

For a detailed description of this feature, see the blog.

Allow specifying row filters for logical replication of tables:

This feature allows specifying an additional WHERE clause after each table in the publication definition. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. The WHERE clause must be enclosed in parentheses. Users can define row filters with the following command:

CREATE PUBLICATION mypub FOR TABLE mytab1 WHERE (c1 > 10 and c2 < 20), mytab2 WHERE (c3 LIKE 'bob');

Users are allowed to specify row filters for existing tables in publication with the command:

ALTER PUBLICATION mypub SET TABLE mytab1 WHERE (c1 > 10 and c2 < 20), mytab2 WHERE (c3 LIKE 'bob');

This can help distribute data among nodes, improve performance by sending data selectively, and by hiding some sensitive data.

Key points to note about this feature:

(a) The row filter WHERE clause for a table added to a publication that publishes UPDATES and/or DELETES must contain only columns that are covered by REPLICA IDENTITY.

(b) The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column.

(c) Row filters are ignored for TRUNCATE TABLE commands.

(d) If the row filter evaluates to NULL, it is regarded as "false" aka the corresponding row won't be replicated.

(e) The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns.

(f) During initial table synchronization, only data that satisfies the row filters is copied to the subscriber.

(g) For partitioned tables, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter.

For a detailed description of this feature, see docs and blog.

Allow specifying column lists for logical replication of tables:

This feature allows specifying an optional column list when adding a table to logical replication. Columns not included in this list are not sent to the subscriber, allowing the schema on the subscriber to be a subset of the publisher schema. The choice of columns can be based on behavioral or performance reasons. Users can define column lists with the following syntax:

CREATE PUBLICATION mypub FOR TABLE mytab1 (c1, c2), mytab2 (c3);

Users are allowed to specify column lists for existing tables in publication with the command:

ALTER PUBLICATION mypub SET TABLE mytab1 (c1, c2);

Key points to note about this feature:

(a) If a publication publishes UPDATES and/or DELETES, any column list must include the table's replica identity columns.

(b) If a publication publishes only INSERT operations, then the column list may omit replica identity columns.

(c) Column lists are ignored for TRUNCATE TABLE commands.

(d) A column list can contain only simple column references.

(e) A column list can't be specified if the publication also publishes FOR TABLES IN SCHEMA.

(f) During initial data synchronization, only the published columns are copied.

(g) For partitioned tables, the publication parameter publish_via_partition_root determines which column list is used. If publish_via_partition_root is true, the root partitioned table's column list is used. Otherwise, if publish_via_partition_root is false (the default), each partition's column list is used.

For a detailed description of this feature, see docs and blog.

Allows logical replication to run as the owner of the subscription:

Previously, the subscription's APPLY process will run with the privileges of a superuser but now with PostgreSQL 15, it will run with the privileges of the subscription owner. So, this would prevent logical replication workers from performing insert, update, delete, truncate, or copy commands on tables unless the subscription owner has permission to do so. We allow only superusers, roles with bypassrls, and table owners can replicate into tables with row-level security policies.

The purpose of this work is to allow subscriptions to be managed by non-superusers and protect servers with subscriptions from malicious activity on the publisher side.

Conflict Resolution:

The conflicts can happen due to various reasons like PRIMARY KEY violation, schema being different, etc. during apply of transactions in subscriber. By default, PostgreSQL will keep retrying the operation on an error. Before PostgreSQL 15, users have the following options (a) They can manually remove the conflicting data to allow replication to proceed. (b) Use pg_replication_origin_advance() to advance the LSN to a location beyond the failed transaction so that on restart replication starts from a point after the conflicting transaction.

It is quite inconvenient for users to use any of these methods because for option (a) users are forced to remove/change data on subscribers even though they want corresponding data from the publisher to be ignored. To use option (b), users need to find the LSN of the failing transaction probably by using pg_waldump or some other tool on the publisher side, and also the origin information is not apparent as it was generated internally for the purpose of replication. While using pg_replication_origin_advance(), if users by mistake set the wrong LSN (either of a future commit or of some operation in-between the transaction) then the system can omit the data that it was not supposed to leading to an inconsistent replica.

The other problem is that the system will keep retrying to apply the transaction even when it can't succeed without the users intervention and users don't have any way to stop it apart from manually disabling the subscription by using ALTER SUBSCRIPTION mysub DISABLE;

In PostgreSQL 15, we tried to make the use of pg_replication_origin_advance() easier by providing the required information and by providing a similar but more robust way. The other feature it provides is to allow subscriptions to be disabled on error.

Introduced a new subscription option 'disable_on_error' which allows subscription to be automatically disabled if any errors are detected by subscription workers during data replication from the publisher. Users can specify this option either during CREATE SUBSCRIPTION or in ALTER SUBSCRIPTION command.

CREATE SUBSCRIPTION mysub CONNECTION '…' PUBLICATION mypub WITH (disable_on_error = true);

ALTER SUBSCRIPTION mysub SET (disable_on_error = true);

Then, we extended the error context information of subscription worker error by adding (a) Finish LSN. It will indicate commit_lsn for committed transactions, and prepare_lsn for prepared transactions. (b) Replication origin name. This will contain the name of the replication origin that keeps track of replication progress and is created automatically with the subscription definition.

The extended error context information can make the use of pg_replication_origin_advance() easier for users.

Then, we also introduced a more robust way to skip the conflicting transactions by using the command: ALTER SUBSCRIPTION mysub SKIP (lsn = '0/1566D10'); This is more robust because it will prevent users to set some wrong LSN by performing checks against the specified LSN. We do ensure that the specified LSN must be the same as the first transaction's finish LSN that is sent by the publisher after the restart. The first transaction successfully applied to the subscriber will clear the specified LSN. We also ensure that the specified LSN must be greater than the origin's current LSN.

For a detailed description of this feature, see blogs [1] and [2].

pg_stat_subscription_stats:

A new view that shows stats about errors that occurred during the application of logical replication changes or during initial table synchronization. See docs for more information on this.

Communication improvements between publisher and subscriber:

PostgreSQL has made enhancements in communication to (a) prevent sending the transaction BEGIN/END messages where all the transaction data is filtered, and (b) prevent replication to restart due to timeouts while processing large transactions where most or all the data is filtered.

Before PostgreSQL 15, we use to send BEGIN/END messages for empty transactions (where all changes are skipped/filtered) which waste a lot of CPU cycles and network bandwidth to build and transmit such messages. To avoid sending messages for empty transactions, we started sending the BEGIN message only with the first change transmitted from publisher to subscriber and then we allow to send the END (COMMIT) message only when BEGIN is sent. To avoid any delays in synchronous replication, we do send a keepalive message after skipping an empty transaction and process its feedback.

While processing long transactions where most of the changes are filtered due to say the particular operations are not published, the publisher doesn't send any communication to the subscriber which times out after a certain threshold time leading to a restart of replication. To fix this, we start periodically sending keep_alive messages in such cases.

For a detailed description of this work, see the blog.

I believe this is a good mix of improvements for logical replication in PostgreSQL 15 which will help users. Your feedback here or on PostgreSQL mailing lists is welcome!