Enabling Parallel Apply
NOTICE: created replication slot "sub" on publisher
CREATE SUBSCRIPTION
---------+-----------
sub | p
(1 row)
You can control the degree of parallelism using the max_parallel_apply_workers_per_subscription
parameter. By default, this is set to 2, allowing two large transactions to be applied in parallel per subscription.
How Parallel Apply Works
Parallel apply significantly improves the efficiency of large transaction handling by not waiting for the COMMIT message from the publisher before applying changes. Previously, the subscriber would write all streamed data to temporary files and apply them only after receiving the commit record.
Architecture Overview
When the first stream of a large transaction arrives, the leader apply worker (LA) assigns a parallel apply worker (PA)—if available. The LA sends changes to the PA via a dedicated shared memory queue (shm_mq
). The PA applies the changes and waits for the next stream. This worker remains assigned until the transaction is committed.
At commit time, the LA waits for the PA to finish to ensure commit order is preserved, which is critical to avoid:
- Transaction dependencies: e.g., inserting a row in one transaction and updating it in another.
- Deadlocks: e.g., transactions updating the same rows/tables in reverse order.
Locking Considerations
Parallel apply introduces potential deadlocks, especially when transactions are independent on the publisher but become interdependent on the subscriber due to schema differences (e.g., constraints, indexes).
To detect such deadlocks between LA and PA workers, PostgreSQL uses heavy-weight locks during:
- PA waiting for the next stream
- LA waiting for PA to finish
Example 1: Deadlock Due to Conflicting Changes
Consider a scenario where a table lacks a unique key on the publisher but has one on the subscriber. If:
- PA is applying TX-1
- LA is applying TX-2
...and both transactions conflict due to the unique constraint, a deadlock can occur:
- LA waits for a lock on the unique index
- PA waits for the next stream from LA
- → Deadlock cycle: LA → PA → LA
PostgreSQL detects and reports such deadlocks. Users can skip problematic transactions using: ALTER SUBSCRIPTION ... SKIP ... command.
Example 2: Deadlock Due to shm_mq
Buffer Full
In a variant of the previous example:
- PA-1 and PA-2 apply TX-1 and TX-2 respectively.
- PA-2 waits for PA-1 to finish.
- PA-1 waits for input from LA.
- If the
shm_mq
buffer between LA and PA-2 is full, LA cannot send messages.
To avoid this:
- LA uses non-blocking writes with a timeout.
- If the timeout is exceeded, LA serializes pending messages to a file.
- PA-2 reads from the file, freeing LA to continue processing PA-1.
This mechanism prevents deadlocks caused by full shm_mq
buffers.
No comments:
Post a Comment