Sunday, 29 November 2015

Parallel Sequential Scans in play


Parallelism is now reality in PostgreSQL.  With 9.6, I hope we will see many
different form of queries that can use parallelism to execute.  For now, I will
limit this discussion to what we can already do, which is Parallel Sequential
Scans.

Parallel Sequential Scans are used to scan a relation parallely with the help of
background workers which in turns improve the performance of such scans.  I
will discuss about the scenarios where user can expect a performance boost
due to this feature later in this blog, but first let us understand the basic feature
and how it works.  Three new GUC parameters have been added to tune the
usage of this feature.

max_parallel_degree - This is used to set the maximum number of workers that
can be used for an individual parallel operation.  It is very well possible that the
requested number of workers are not available at execution time.  Parallel workers
are taken from the pool of processes established by max_worker_processes which
means that value of max_parallel_degree should be lesser than max_worker_processes.
It might not be useful to set the value of this parameter more than the number of CPU
count on your system.

parallel_tuple_cost - This is used by planner to estimate the cost of transferring a
tuple from parallel worker process to master backend.  The default is 0.1.  The more
the number of tuples that needs to be passed from worker backend processes to
master backend process, the more this cost will be and more overall cost of
parallel sequential scan plan.

parallel_setup_cost - This is used by planner to estimate the cost of launching parallel
worker processes and setting up dynamic shared memory to communicate.
The default is 1000.

Now let us see the simple example to demonstrate how parallel sequential scan works:
 create table tbl_parallel_test(c1 int, c2 char(1000));   
 insert into tbl_parallel_test values(generate_series(1,1000000),'aaaaa');   
 Analyze tbl_parallel_test;   
 Explain analyze select * from tbl_parallel_test where c1 < 10000 and  
 c2 like '%bb%';   
               QUERY PLAN              
  -------------------------------------------------------------------------------------------------------------   
  Seq Scan on tbl_parallel_test  
          (cost=0.00..157858.09 rows=1 width=1008)  
          (actual time=378.414..378.414 rows=0 loops=1)   
   Filter: ((c1 < 10000) AND (c2 ~~ '%bb%'::text))   
   Rows Removed by Filter: 1000000   
  Planning time: 0.075 ms   
  Execution time: 378.431 ms   
  (5 rows)   

Set the max parallel degree to enable the use of parallelism in queries.
 set max_parallel_degree = 6;  
 Explain analyze select * from tbl_parallel_test where c1 < 10000  
 and c2 like '%bb%';  
                                QUERY PLAN                    
 -------------------------------------------------------------------------------------------------------------  
  Gather (cost=1000.00..29701.57 rows=1 width=1008)   
        (actual time=182.708..182.708 rows=0 loops=1)  
   Number of Workers: 5  
   -> Parallel Seq Scan on tbl_parallel_test  
         (cost=0.00..28701.47 rows=1 width=1008)  
         (actual time=179.496..1081.120 rows=0 loops=1)  
      Filter: ((c1 < 10000) AND (c2 ~~ '%bb%'::text))  
      Rows Removed by Filter: 1000000  
  Planning time: 0.078 ms  
  Execution time: 200.610 ms  
 (7 rows)  

Here, we can see how changing max_parallel_degree allows the usage of parallel workers
to perform parallel sequential scans.  We can notice in above example that even though we
have set max_parallel_degree as 6, still it uses 5 workers and the reason for same is that
currently the parallel workers are choosen based on size of relation.

Next, let us discuss about usage of functions in parallel query. A new clause PARALLEL
is added to the CREATE FUNCTION statement.  There are three valid values that can be
used by user with this clause.

1. PARALLEL Unsafe - This indicates that the function can't be executed in parallel mode
and the presence of such a function in a SQL statement forces a serial execution plan.
2. PARALLEL Restricted - This indicates that the function can be executed in parallel mode,
but the execution is restricted to parallel group leader.  As of now, if the qualification for any
particular relation has anything that is parallel restricted, that relation won't be chosen for
parallelism.
3. Parallel Safe - This indicates that the function is safe to run in parallel mode without
restriction.

The default value for function is PARALLEL Unsafe.

Now let us see the impact of using Parallel Safe and Unsafe function in the queries.  I will
continue using the query used in previous example to explain the concept.

Create a Parallel Safe function
 create or replace function calc_factorial(a integer, fact_val integer)  
 returns integer   
  as $$   
  begin   
    perform (fact_val)!;   
    return a;   
  end;   
  $$ language plpgsql PARALLEL Safe;  
Use it in query
 Explain analyze select * from tbl_parallel_test where  
               c1 < calc_factorial(10000, 10)   
               and c2 like '%bb%';   
        QUERY PLAN   
  --------------------------------------------------------------------------------   
  Gather (cost=1000.00..75154.99 rows=1 width=1008)   
     (actual time=120566.456..120566.456 rows=0 loops=1)   
   Number of Workers: 5   
   -> Parallel Seq Scan on tbl_parallel_test   
      (cost=0.00..74154.89 rows=1 width=1008)   
      (actual time=119635.421..359721.498 rows=0 loops=1)   
    Filter: ((c2 ~~ '%bb%'::text) AND (c1 < calc_factorial(10000, 10)))   
    Rows Removed by Filter: 1000000   
  Planning time: 54.904 ms   
  Execution time: 120622.631 ms   
  (7 rows)   

Here we can see that Parallel Plan is chosen and the parallel safe function
is pushed to workers for evaluation of quals.

Now lets change that function as Parallel Unsafe and see how the above
query behaves.

  Alter Function calc_factorial(integer, integer) PARALLEL Unsafe;   
  Explain analyze select * from tbl_parallel_test where  
              c1 < calc_factorial(10000, 10)   
              and c2 like '%bb%';   
         QUERY PLAN   
  --------------------------------------------------------------------------------   
  Seq Scan on tbl_parallel_test   
     (cost=0.00..407851.91 rows=1 width=1008)   
     (actual time=33166.138..33166.138 rows=0 loops=1)   
   Filter: ((c2 ~~ '%bb%'::text) AND (c1 < calc_factorial(10000, 10)))   
   Rows Removed by Filter: 1000000   
  Planning time: 0.162 ms   
  Execution time: 33166.208 ms   
  (5 rows)   

So using parallel unsafe functions in queries would lead to serial plans.

Next, let us see the Performance characteristics of Parallelism:

Non-default settings used to collect performance data:
 shared_buffers=32GB; min_wal_size=5GB; max_wal_size=10GB  
 checkpoint_timeout =30min; max_connections=300;  
 max_worker_processes=100;  

Test setup
 create table tbl_perf(c1 int, c2 char(1000));  
 insert into tbl_perf values(generate_series(1,30000000),'aaaaa');  
 Explain analyze select c1 from tbl_perf where  
              c1 > calc_factorial($1,10) and  
              c2 like '%aa%';  
The function calc_factorial is same as used in previous example and the values passed
to it are such that the desired percentage of rows can be selected.  Example
 --"to select 1% of rows, below query can be used"  
 Explain analyze select c1 from tbl_perf where  
              c1 > calc_factorial(29700000,10) and  
              c2 like '%aa%';"  
 --"to select 10% of rows, below query can be used"  
 Explain analyze select c1 from tbl_perf where  
              c1 > calc_factorial(27000000,10) and  
              c2 like '%aa%';"  
 --"to select 25% of rows, below query can be used"  
 Explain analyze select c1 from tbl_perf where  
              c1 > calc_factorial(22500000,10) and  
              c2 like '%aa%';"  
Performance Data -





















1. With increase in degree of parallelism (more parallel workers), the time to complete
the execution reduces.
2. Along with workers, master backend also participates in execution due to which you
can see more time reduction in some cases.
3. After certain point, increasing max parallel degree won't help.

The cases we have seen in this blog are mostly the cases where parallel query helps by
using the workers, however there exists some cases like when qualification is very cheap
where it hurts or won't help even by employing more number of workers.  There is
more investigation needed to make sure that planner won't choose such plans for parallelism.

4 comments:

  1. Just curious, what is the relationship to CPU cores here? Presumably your system under test has 64 cores, commensurate with the DOP being tested?

    ReplyDelete
    Replies
    1. The point about CPU cores is that if we employ workers more than CPU count in system for doing parallel query, it might not help because the workers will not be able to run parallely considering most of the work is CPU-intensive.
      As far as the system under test is concerned, it has more than 64 CPU count, the reason for showing the test with max_parallel_degree
      as 64 is not related to CPU count, rather it was just to show the
      behaviour of parallel scans at different values of parallel degree.

      Delete
  2. The non-parallel, "PARALLEL UNSAFE" function query is almost 4x faster than the parallel/safe version... What's going on there? plpgsql issue?


    Thanks for your work on this! Curious to see what else can be parallel in 9.6 (sorts, joins, ...). :-)

    ReplyDelete
    Replies
    1. I have done that test on my local Windows m/c without appropriate
      settings in postgresql.conf. The purpose of that test was just to
      show the usage of Parallel Safe functions with parallel query.
      Anyway today, I have done the tests on the Power-8 m/c with
      appropriate settings (shared_buffers=32GB) and the results shows
      that Parallel Safe version took less than half the time as compare to
      Parallel Unsafe version. Results of tests are as below:

      postgres=# Explain analyze select * from tbl_parallel_test where c1 < calc_factorial(10000, 10) and c2 like '%bb%';
      QUERY PLAN
      ---------------------------------------------------------------------------------------------------------------------------------------
      Gather (cost=1000.00..75155.28 rows=1 width=1008) (actual time=1532.915..1532.915 rows=0 loops=1)
      Number of Workers: 5
      -> Parallel Seq Scan on tbl_parallel_test (cost=0.00..74155.18 rows=1 width=1008) (actual time=1528.242..9175.564 rows=0 loops=1)
      Filter: ((c2 ~~ '%bb%'::text) AND (c1 < calc_factorial(10000, 10)))
      Rows Removed by Filter: 1000000
      Planning time: 0.078 ms
      Execution time: 1562.325 ms
      (7 rows)

      Alter Function calc_factorial(integer, integer) PARALLEL Unsafe;

      postgres=# Explain analyze select * from tbl_parallel_test where c1 < calc_factorial(10000, 10) and c2 like '%bb%';
      QUERY PLAN
      -------------------------------------------------------------------------------------------------------------------------
      Seq Scan on tbl_parallel_test (cost=0.00..407853.49 rows=1 width=1008) (actual time=4901.296..4901.296 rows=0 loops=1)
      Filter: ((c2 ~~ '%bb%'::text) AND (c1 < calc_factorial(10000, 10)))
      Rows Removed by Filter: 1000000
      Planning time: 0.096 ms
      Execution time: 4901.383 ms
      (5 rows)

      Delete