Comments on PostgreSQL and Databases in general: Parallelism, what next?

That is unfortunate and the first time I have ever...

2020-09-10T15:42:56.875-07:00

That is unfortunate and the first time I have ever heard bad news from PG developers. Thank you and everyone else that has tried to tackle the big and hard problems that make a big difference. Hopefully in the future there might be a blog on whats left remaining and how others can help get it over the line as it seems like such a critical piece of work. Again, thank you and to everyone for their hard and amazing work on this and all things PG over the years.

As far as I know nobody is actively working on it ...

2020-09-05T03:36:22.839-07:00

As far as I know nobody is actively working on it at this stage.

I havent seen any update on zheap for a long while...

2020-09-05T02:51:43.287-07:00

I havent seen any update on zheap for a long while and github doesnt look active. Is the project still being actively worked on? and last I read PG14 might be earliest to be able to take it for a spin?

Thanks for the insight. Would be a very useful fea...

2020-02-27T14:14:48.539-08:00

Thanks for the insight. Would be a very useful feature to have.

To make them parallel, we need to first do some in...

2020-02-27T03:59:15.674-08:00

To make them parallel, we need to first do some infrastructure work like
a. change locking mechanism in some way so that parallel processes block each other for certain type of heavy-weight locks like relation extension lock and page lock. This can allow inserts.
b. For updates/deletes, I think we need to have shared combo CID hash, so that all participating processes know about them.
c. Then, we might need to do something about tuple locks depending on how we implement parallel update/deletes.

After that, the actual work to make writes parallel might not be much.

Any thoughts on parallel queries that write data (...

2020-02-26T14:33:34.750-08:00

Any thoughts on parallel queries that write data (Insert/Update)?

This article is old but shows how to use skip scan...

2020-02-26T06:57:36.606-08:00

This article is old but shows how to use skip scan index search algorithm on any query. It has a side benefit that any query can be run in a minimum of two parallel processes. https://waa.ai/TsW0

I think it depends if the CTE can be inlined, then...

2020-02-22T05:07:57.069-08:00

I think it depends if the CTE can be inlined, then it can use parallelism for the entire statement. Based on your example, I have constructed a simple test and result is as below:
postgres=# Explain (Costs off) WITH a AS (select * from t1), b AS (Select * from t2) Select * from a, b;
QUERY PLAN
-------------------------------------
Gather
Workers Planned: 2
-> Nested Loop
-> Parallel Seq Scan on t1
-> Seq Scan on t2
(5 rows)
So such CTEs would use parallel plans and can parallelize the entire query. However if due to some reason, it can't inline the query, then the parallelism can be used only for the part of the statement. See below example:
postgres=# Explain (Costs off) WITH a AS Materialized (select * from t1), b AS Materialized (Select * from t2) Select * from a, b;
QUERY PLAN
---------------------------------------
Nested Loop
CTE a
-> Gather
Workers Planned: 2
-> Parallel Seq Scan on t1
CTE b
-> Seq Scan on t2
-> CTE Scan on a
-> CTE Scan on b
(9 rows)
The CTE scan itself can't be parallelized as of now.

Does this help?

I was thinking about this kind or parallelism: WI...

2020-02-21T10:59:24.875-08:00

I was thinking about this kind or parallelism:

WITH a (
select from x
), b (
select from y
), c (
select from z
), d (
select from a
), e (
select from a,c
) /* f */
SELECT FROM e,d

Thread1: a,d,
Thread2: b,-,
Thread3: c,e,f

Is it useful? I think so:)
But is it possible?

Good point, Thomas!

2020-02-19T21:16:04.846-08:00

Good point, Thomas!

Note that CTEs can finally be inlined in PostgreSQ...

2020-02-19T20:59:36.640-08:00

Note that CTEs can finally be inlined in PostgreSQL 12, which means that using WITH syntax doesn't necessarily prevent parallelism anymore! Parallelising materialised CTEs would require a bunch more machinery.

2020-02-19T20:59:06.372-08:00

This comment has been removed by the author.

I could see that array_agg(...), json*_agg(...), j...

2020-02-19T20:18:34.790-08:00

I could see that array_agg(...), json*_agg(...), json*_object_agg(...) are marked as parallel safe. You can check by executing statement: select proname, proparallel from pg_proc where proname like 'array%'; and similarly for json functions. I see no reason for those to prevent a parallel plan. I checked these in HEAD. If you can share your exact query for count(distinct ...), I might be able to help better. Feel free to discuss such things on pgsql-hackers (https://www.postgresql.org/list/pgsql-hackers/) or other PG mailing list.

The first point is CTE can contain DML statements ...

2020-02-19T19:46:28.804-08:00

The first point is CTE can contain DML statements like update/delete, so we won't be able to parallelise those as we still don't have parallelism for DML statements. Now, if the CTE contains read-only statements (Select queries), we can think of parallelising such statements, but I think this area would require more thoughts.

What about parallel CTE? Each CTE node on the sep...

2020-02-19T10:08:39.997-08:00

What about parallel CTE? Each CTE node on the separate worker

Thank you very much for the interesting article! ...

2020-02-19T07:08:26.009-08:00

Thank you very much for the interesting article!

I think it would be another big step towards the goal of achieving more parallelism if the following aggregate functions became "parallel safe":

array_agg(...)
json*_agg(...)
json*_object_agg(...)
count(distinct ...)

I have seen several cases where these functions have prevented a parallel plan.