With the introduction of failover slots in PostgreSQL-17, logical replication can now be effectively utilized in high-availability environments. This feature ensures that logical replication continues seamlessly even if the publisher node fails and its physical standby is promoted as the new publisher.
To maintain continuous replication after a failover, you need to configure the following parameters and options for both logical and physical replication setups:
1. Enable Failover Property on Slots:
• Set the failover property for the slots corresponding to the subscriptions that need to continue receiving changes after a failover. This can be done using the failover option in CREATE SUBSCRIPTION or the failover parameter in the pg_create_logical_replication_slot() SQL API. These slots will be referred to as failover slots.
2. Synchronize Replication Slots:
• Set the sync_replication_slots parameter to on on the standby. This ensures that the failover slots are periodically synchronized by the slotsync worker.
3. Configure Physical Replication Slot:
• Ensure a physical replication slot exists between the primary and the standby. Set primary_slot_name on the standby and ensure the slot with that name exists on the primary. Also, enable hot_standby_feedback on the standby to prevent row removal needed by the standby.
4. Prevent Subscriber Overrun:
• On the primary, set the synchronized_standby_slots parameter to include the physical replication slot. This prevents the subscriber from consuming changes faster than the hot standby can handle.
5. Update Connection Info:
• Modify primary_conninfo on the standby to include dbname=postgres. This is necessary for the slot sync worker to connect to the primary.
Example Setup
Here's an example demonstrating logical replication continuity after failover using three nodes: Node-A (primary server/publisher), Node-B (physical standby for Node-A), and Node-C (subscriber for Node-A).
Initialize the publisher database cluster:
initdb -D <data_pub_path>
Initialize the subscriber database cluster:
initdb -D <data_sub_path>
pg_ctl start -o "-p 5444" -D <data_sub_path>
Verify Replication
SELECT slot_name, failover FROM pg_replication_slots;
On Node-C (Subscriber):
SELECT * FROM mytbl;
Node-B (Physical Standby)
Create a base backup:
pg_basebackup -D <data_standby_path> -R -X stream -P
Start the standby server:
pg_ctl start -o "-p 5555" -D <data_standby_path>
Create a physical replication slot:
Set the following parameters:
ALTER SYSTEM SET sync_replication_slots = on;
ALTER SYSTEM SET primary_slot_name = 'sb1_slot';
ALTER SYSTEM SET hot_standby_feedback = on;
Restart the standby server and ensure the failover slot is synced:
SELECT slot_name, failover, synced FROM pg_replication_slots;
Stop the primary server (Node-A):
pg_ctl stop -D <data_pub_path>
Promote the standby server (Node-B):
pg_ctl promote -D <data_standby_path>
Disable the subscription:
ALTER SUBSCRIPTION mysub DISABLE;
Change the connection to the new primary:
ALTER SUBSCRIPTION mysub CONNECTION 'dbname=postgres port=5555';
Enable the subscription:
ALTER SUBSCRIPTION mysub ENABLE;
On Node-B (New Primary):
INSERT INTO mytbl VALUES (generate_series(3, 4));
SELECT * FROM mytbl;
c1
----
1
2
3
4
Slot synchronization on a physical standby works by fetching logical failover slot information from the primary server, creating the slots on the standby, and synchronizing them periodically. This can be done automatically by enabling the slot sync worker or manually using the pg_sync_replication_slots() function.
If the WAL corresponding to the remote's restart_lsn is unavailable or the remote's catalog_xmin precedes the "oldest xid," the local standby slot cannot be created, and the slot will be marked as RS_TEMPORARY. Once the primary server catches up, the slot will be marked as RS_PERSISTENT.
No comments:
Post a Comment