Search code examples
citus

Citus parallelism limitations


I'm experimenting with Citus extension for PostgreSQL to enable parallelism for my analythics queries (single node setup). I have a very simple UPDATE statement for a distributed table which runs perfectly parallel when execute from the IDE (DBeaver) or psql command:

    UPDATE my_distributed_table a1
    SET
        col1 = b.col1,
        col2 = b.col2
    FROM my_distributed_table a
    LEFT JOIN my_local_table b
        ON b.id = a.report_id
    WHERE
        a.id = a1.id and
        a.report_id = a1.report_id;  -- Sharding key

The table is quite big (> 300mln records) so I can see 32 separate sessions (1 per shard I guess) during the query exection. But, when I put this UPDATE query into a BEGIN/COMMIT transaction, or into a stored procedure, or even a LANGUAGE SQL function (which is not supposed to start a transaction), and execute it with either CALL or SELECT, there is no parallelism! There is only one session which takes forever.

I tried several tweaks like these but no luck:

SET citus.max_adaptive_executor_pool_size = 32; -- DEFAULT 16
SET citus.force_max_query_parallelization to OF; -- DEFAULT off

What I need is to put this UPDATE into either a stored procedureo or a function and make it executing in parallel.

I could not find any clear statemens by googling or looking in Citus docs regarding support of parallelism in transactions, functions or stored procedures, but I believe it must be supported. Maybe there is some setting I'm missing?

Thank you


Solution

  • Adding "ANALYZE my_distributed_table;" before the UPDATE statement solved the problem. Now no matter if is it executed as a standalone, from a transaction, or from a procedure/function, multiple sessions are created as expected.