Sunday, 10 May 2020

Improved (auto)vacuum in PostgreSQL 13

Vacuum is one of the sub-systems in PostgreSQL which gets improved with each release.  I have checked past five releases and each has quite a few improvements for vacuum.  Following the trend, there are a number of improvements in vacuum in the upcoming PostgreSQL release (v13) which are covered in this blog.

Improvement-1
---------------------
Vacuum will be allowed to process indexes in parallel. This means it can leverage multiple CPUs in order to perform index cleanup. This internally uses the background worker infrastructure of PostgreSQL to accomplish the work.  There is a new option PARALLEL which controls the parallelism used by vacuum.  Users can use the new option to specify the number of workers that can be used to perform the vacuum command which is limited by the minimum of (a) the number of indexes on a table and (b) max_parallel_maintenance_workers.  The parallelism for vacuum command is enabled by default which means it will be used even if the user didn't specify the PARALLEL option and it uses the number of workers equal to the number of indexes on the table being vacuumed.  We can disable parallelism for vacuum by specifying zero as the number of workers with the PARALLEL option.  The index can participate in parallel
vacuum iff it's size is greater than min_parallel_index_scan_size.

The PARALLEL option can't be used with the FULL option in vacuum command.  This feature won't be available via autovacuum, users need to use vacuum command to get the benefit provided by this feature.

There are a number of other blogs [1][2] written on this topic which shows the benefit of this feature.  A recent blog published by EnterpriseDB shows that vacuum could be approximately 4 times faster by using 7 workers especially when the relation is in dire need of vacuum, read that blog for more information about the test.

Here is a simple test to show the usage of this feature.
postgres=# create table pvac(c1 int, c2 text DEFAULT md5(random()::text), c3 text DEFAULT md5(random()::text));
CREATE TABLE
postgres=# create index pvac_1 on pvac(c1);
CREATE INDEX
postgres=# create index pvac_2 on pvac(c2);
CREATE INDEX
postgres=# create index pvac_3 on pvac(c3);
CREATE INDEX
postgres=# insert into pvac select i FROM generate_series(1,100000) as i;
INSERT 0 100000
postgres=# update pvac set c1=c1;
UPDATE 100000

postgres=# vacuum (parallel 4, verbose) pvac;
INFO:  vacuuming "public.pvac"
INFO:  launched 2 parallel vacuum workers for index vacuuming (planned: 2)
INFO:  scanned index "pvac_1" to remove 100000 row versions
..
..

In the above test, 2 parallel workers were used even though we have specified 4 and the reason is that the number of indexes is 3, and the number of workers can't be more than index which is further limited by max_parallel_maintenance_workers as you can see by command below.

postgres=# show max_parallel_maintenance_workers;
 max_parallel_maintenance_workers 
----------------------------------
 2
(1 row)

The other most important thing we have ensured in this feature is that the parallelism won't use more memory or I/O bandwidth as compared to non-parallel vacuum.

Improvement-2
---------------------
Allow inserts to trigger autovacuum activity.  This feature will be really helpful for the insert-only tables where anti-wraparound vacuums could be the first vacuum that the table ever receives and such a run would take a really long time.   This allows heap pages to be set as all-visible, which then allows index-only scans to skip heap fetches, and reduces the work necessary when the table needs to be frozen.  This is controlled by two new GUCs and reloptions; autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor.

Let me demonstrate the benefit of this feature with the help on an example.  Start the server with autovacuum_vacuum_insert_threshold = -1 (one can edit postgresql.conf file to change value of this parameter or can use Alter System command).  By connecting with psql, we can execute below commands to see the behavior.

postgres=# show autovacuum_vacuum_insert_threshold;
 autovacuum_vacuum_insert_threshold 
------------------------------------
 -1
(1 row)

postgres=# create table vac_ins(c1 int, c2 char(500));
CREATE TABLE
postgres=# create index idx_vac_ins on vac_ins(c1);
CREATE INDEX
postgres=# insert into vac_ins values(generate_series(1,20000),'aaaaaa');
INSERT 0 20000

After a few seconds, you can notice the below message in the server log which shows that autovacuum has performed analyze on the table.

LOG:  automatic analyze of table "postgres.public.vac_ins" system usage: CPU: user: 0.03 s, system: 0.11 s, elapsed: 0.15 s

After that, run below command:

postgres=# explain (analyze) select c1 from vac_ins where c1 < 100;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_vac_ins on vac_ins  (cost=0.29..16.02 rows=99 width=4) (actual time=0.019..0.092 rows=99 loops=1)
   Index Cond: (c1 < 100)
   Heap Fetches: 99
 Planning Time: 0.269 ms
 Execution Time: 0.129 ms
(5 rows)

Here, "Heap Fetches: 99" shows that the above query need to visit heap to fetch the required information even though it is present in index and the scan type used is Index Only Scan.

postgres=# truncate vac_ins;
TRUNCATE TABLE

Now, restart the server with a default value of autovacuum_vacuum_insert_threshold and execute below commands from psql to see how the new feature helps:

postgres=# show autovacuum_vacuum_insert_threshold;
 autovacuum_vacuum_insert_threshold 
------------------------------------
 1000
(1 row)

postgres=# insert into vac_ins values(generate_series(1,20000),'aaaaaa');
INSERT 0 20000

After a few seconds, you can notice the below message in the server log which indicates that autovacuum has performed both vacuum and analyze on the table.
LOG:  automatic vacuum of table "postgres.public.vac_ins": index scans: 0
pages: 0 removed, 1334 remain, 0 skipped due to pins, 0 skipped frozen
...
LOG:  automatic analyze of table "postgres.public.vac_ins" system usage: CPU: user: 0.01 s, system: 0.15 s, elapsed: 0.21 s

After that, run below command:

postgres=# explain (analyze) select c1 from vac_ins where c1 < 100;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_vac_ins on vac_ins  (cost=0.29..6.02 rows=99 width=4) (actual time=0.016..0.039 rows=99 loops=1)
   Index Cond: (c1 < 100)
   Heap Fetches: 0
 Planning Time: 0.166 ms
 Execution Time: 0.074 ms
(5 rows)

Here, "Heap Fetches: 0" shows that the above query doesn't need to visit heap to fetch the required information. We can see that the time to complete the execution is reduced significantly in this case.

Improvement-3
---------------------
Allow an (auto)vacuum to display additional information about the heap or index in case of an error.  This feature could help users in case the database has some corruption.  For example, if one of the indexes on a relation has some corrupted data (due to bad hardware or some bug), it will let the user know the index information, and the user can take appropriate action like either Reindex or maybe drop and recreate that particular index to overcome the problem.  In the case of the heap, it displays the block number for which an error has occurred which makes it much easier for users and developers to detect the problem.  In the worst case, if any particular block is corrupted in a table, users can remove all the rows of that particular block and the table can be used.

vacuum pvac;
ERROR:  error induced to demonstrate use of information
CONTEXT:  while scanning block 2469 of relation "public.pvac"

Next, I used "SELECT * FROM heap_page_items(get_raw_page('pvac', 2469));" to find the information of all line pointers in the page and then removed them from using below query. 

postgres=# delete from pvac where ctid Between '(2469,1)' and '(2469,11)';
DELETE 11

Then, I again ran vacuum

postgres=# vacuum pvac;
ERROR:  error induced to demonstrate use of information
CONTEXT:  while scanning block 2468 of relation "public.pvac"

Now, you can see the block number is changed from 2469 to 2468 which means the vacuum could proceed.  As, in this case, I have manually induced the error by changing code, so it occurs for every block but in reality it would be for some particular block(s) and once user can get rid of those block(s), the table can be reused.  I don't want to say that is an ideal situation but at least it will allow users to proceed and it can help developers to narrow down the bug if there is any in the code.

Improvement-4
---------------------
Autovacuum will now log WAL usage statistics along with other information.  The WAL usage contains information on the total number of records, number of full pages images, and the total number of bytes.  The buffer usage and WAL usage stats combined gives us approximate usage of I/O by a particular autovacuum run. I took one example runs information which will be shown below, one can refer to "WAL usage" to check the information on newly added stats.

LOG:  automatic vacuum of table "postgres.public.pvac": index scans: 1
pages: 0 removed, 2470 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 100000 removed, 100000 remain, 0 are dead but not yet removable, oldest xmin: 529
buffer usage: 9276 hits, 4 misses, 3 dirtied
avg read rate: 0.062 MB/s, avg write rate: 0.047 MB/s
system usage: CPU: user: 0.40 s, system: 0.00 s, elapsed: 0.50 s
WAL usage: 7909 records, 2 full page images, 2276323 bytes

Improvement-5
---------------------
Make vacuum buffer counters 64-bits wide to avoid overflow of buffer usage stats.  Without this feature, in extreme cases, if there exist tables that are large enough for 4 billion buffer accesses to be a possibility, the stats displayed are meaningless.  See below example of 'buffer usage' stats from pgsql-hackers:

LOG:  automatic vacuum of table "somtab.sf.foobar": index scans: 17
pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen
tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable
buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied
avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s
system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec

Improvement-6
---------------------
Add wait event 'VacuumDelay' to report on cost-based vacuum delay.  This will help us to monitor the [auto]vacuum throttling.  This is a small feature but quite important as right now we have no way to monitor [auto]vacuum throttling and it is not very uncommon to see this in user environments.