Search code examples
postgresqlperformancequery-optimization

SQL query lost perfomance after adding filter with subselect


I have a function that totally lost it's performance after i added another filter to it in postgresql

Here is a simple example of how it looked like at first with good performance.

        CREATE OR REPLACE FUNCTION my_function(param_a boolean, param_b boolean )
     RETURNS TABLE(blablabla)
     LANGUAGE sql
     IMMUTABLE
    AS $function$
    with data as (
    select id,amount,account_nr from transfer
    )
    select * from 
    data d
    where  param_a or 0.00 <> (select sum(d2.amount)
                            from data d2
                            where d2.id = d.id)
    $function$
    ;
    (cost=0.25..10.25 rows=1000 width=560)  
    (actual time=1162.528..1162.561 rows=306 loops=1)  
    Buffers: shared hit=1099180  
    Planning time: 2.928 ms  
    Execution time: 1162.630 ms

After i added another filter with a subselect and count i lost my perfomance. Is this count so bad for performance and can i solve it another way?

CREATE OR REPLACE FUNCTION my_function(param_a boolean, param_b boolean )
 RETURNS TABLE(blablabla)
 LANGUAGE sql
 IMMUTABLE
AS $function$
with data as (
select id,amount,account_nr from transfer
)
select * from 
data d
where (param_b or 1 < (select count(d2.account_nr) 
                        from data d2
                        where d2.id = d.id
                        group by d2.account_nr))
and (param_a or 0.00 <> (select sum(d2.amount)
                        from data d2
                        where d2.id = d.id))
$function$
;
    (cost=0.25..10.25 rows=1000 width=560)  
    (actualtime=271191.341..271191.383 rows=306 loops=1)  
    Buffers: shared hit=1099180  
    Planning time: 2.955 ms 
    Execution time: 271191.463 ms

Solution

  • Your slow query, embeddded in your stored function, is this:

    with data as (             -- original query from the question.
       select id,amount,account_nr from transfer
    )
    select * 
      from data d
     where (param_b or 1 < (select count(d2.account_nr) 
                            from data d2
                            where d2.id = d.id
                            group by d2.account_nr)
           )
       and (param_a or 0.00 <> (select sum(d2.amount)
                            from data d2
                            where d2.id = d.id)
           )
    

    This has a pointless common table expression. We can get rid of it for simplicity's sake. You can always put it back if you need it for some other purpose.

    And has a couple of correlated subqueries. Let's refactor them into a single independent subquery. Starting with that independent subquery:

                select id, 
                       count(account_nr) account_count,
                       sum(amount) total_amount
                  from transfer
                 group by id
    

    This aggregate subquery generates the number of accounts and the total amount for each id in your transfer table. Eyeball the results to convince yourself it does what you need it to do.

    Then we can join this to the main query and apply your WHERE conditions.

    select d.id, d.amount, d.account_nr 
      from transfer d
      join (
                select id, 
                       count(account_nr) account_count,
                       sum(amount) total_amount
                  from transfer
                 group by id
           )  d2 ON d.id = d2.id
     where (param_b or 1 < d2.account_count)
       and (param_a or 0.00 <> d2.total_amount)
    

    Using the independent subquery can speed things up a lot; sometimes the query planner decides it needs to re-evaluate the dependent subquery many times.

    The following index will help the subquery run faster.

    CREATE INDEX id_details ON transfer (id) INCLUDE (account_nr, amount);
    

    Convince yourself this works and is fast enough. (I did not debug it, because I don't have your data.) You'll need to test it substituting true and false for param_a and param_b.

    Then, and only then, put it into your stored function.