Search code examples
sqlpostgresqldatabase-performance

Postgres: STABLE function called multiple times on constant


I'm having a Postgresql (version 9.4) performance puzzle. I have a function (prevd) declared as STABLE (see below). When I run this function on a constant in where clause, it is called multiple times - instead of once. If I understand postgres documentation correctly, the query should be optimized to call prevd only once.

A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement

Why it doesn't optimize calls to prevd in this case? I'm not expecting prevd to be called once for all subsequent queries using prevd on the same argument (like it was IMMUTABLE). I'm expecting postgres to create a plan for my query with just one call to prevd('2015-12-12')

Please find the code below:

Schema

create table somedata(d date, number double precision);
create table dates(d date);

insert into dates
select generate_series::date
from   generate_series('2015-01-01'::date, '2015-12-31'::date, '1 day');

insert into somedata
select '2015-01-01'::date + (random() * 365 + 1)::integer, random()
from   generate_series(1, 100000);

create or replace function prevd(date_ date)
returns date
language sql
stable
as $$
  select max(d) from dates where d < date_;
$$

Slow Query

select avg(number) from somedata where d=prevd('2015-12-12');

Poor query plan of the query above

 Aggregate  (cost=28092.74..28092.75 rows=1 width=8) (actual time=3532.638..3532.638 rows=1 loops=1)
   Output: avg(number)
   ->  Seq Scan on public.somedata  (cost=0.00..28091.43 rows=525 width=8) (actual time=10.210..3532.576 rows=282 loops=1)
         Output: d, number
         Filter: (somedata.d = prevd('2015-12-12'::date))
         Rows Removed by Filter: 99718
 Planning time: 1.144 ms
 Execution time: 3532.688 ms
(8 rows)

Performance

The query above, on my machine runs around 3.5s. After changing prevd to IMMUTABLE, it's changing to 0.035s.


Solution

  • I started writing this as a comment, but it got a bit long, so I'm expanding it into an answer.

    As discussed in this previous answer, Postgres does not promise to always optimise based on STABLE or IMMUTABLE annotations, only that it can sometimes do so. It does this by planning the query differently by taking advantage of certain assumptions. This part of the previous answer is directly analogous to your case:

    This particular sort of rewriting depends upon immutability or stability. With where test_multi_calls1(30) != num query re-writing will happen for immutable but not for merely stable functions.

    If you change the function to IMMUTABLE and look at the query plan, you will see that the rewriting it does is really rather radical:

    Seq Scan on public.somedata  (cost=0.00..1791.00 rows=272 width=12) (actual time=0.036..14.549 rows=270 loops=1)
      Output: d, number
      Filter: (somedata.d = '2015-12-11'::date)
      Buffers: shared read=541 written=14
    Total runtime: 14.589 ms
    

    It actually runs the function while planning the query, and substitutes the value before the query is even executed. With a STABLE function, this optimisation would clearly not be appropriate - the data might change between planning and executing the query.

    In a comment, it was mentioned that this query results in an optimised plan:

    select avg(number) from somedata where d=(select prevd(date '2015-12-12'));
    

    This is fast, but note that the plan doesn't look anything like what the IMMUTABLE version did:

    Aggregate  (cost=1791.69..1791.70 rows=1 width=8) (actual time=14.670..14.670 rows=1 loops=1)
      Output: avg(number)
      Buffers: shared read=541 written=21
      InitPlan 1 (returns $0)
        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
              Output: '2015-12-11'::date
      ->  Seq Scan on public.somedata  (cost=0.00..1791.00 rows=273 width=8) (actual time=0.026..14.589 rows=270 loops=1)
            Output: d, number
            Filter: (somedata.d = $0)
            Buffers: shared read=541 written=21
    Total runtime: 14.707 ms
    

    By putting it into a sub-query, you are moving the function call from the WHERE clause to the SELECT clause. More importantly, the sub-query can always be executed once and used by the rest of the query; so the function is run once in a separate node of the plan.

    To confirm this, we can take the SQL out of a function altogether:

    select avg(number) from somedata where d=(select max(d) from dates where d <  '2015-12-12');
    

    This gives a rather longer plan with very similar performance:

    Aggregate  (cost=1799.12..1799.13 rows=1 width=8) (actual time=14.174..14.174 rows=1 loops=1)
      Output: avg(somedata.number)
      Buffers: shared read=543 written=19
      InitPlan 1 (returns $0)
        ->  Aggregate  (cost=7.43..7.44 rows=1 width=4) (actual time=0.150..0.150 rows=1 loops=1)
              Output: max(dates.d)
              Buffers: shared read=2
              ->  Seq Scan on public.dates  (cost=0.00..6.56 rows=347 width=4) (actual time=0.015..0.103 rows=345 loops=1)
                    Output: dates.d
                    Filter: (dates.d < '2015-12-12'::date)
                    Buffers: shared read=2
      ->  Seq Scan on public.somedata  (cost=0.00..1791.00 rows=273 width=8) (actual time=0.190..14.098 rows=270 loops=1)
            Output: somedata.d, somedata.number
            Filter: (somedata.d = $0)
            Buffers: shared read=543 written=19
    Total runtime: 14.232 ms
    

    The important thing to note is that the inner Aggregate (the max(d)) is executed once, on a separate node from the main Seq Scan (which is checking the where clause). In this position, even a VOLATILE function can be optimised in the same way.

    In short, while you know that the query you've produced can be optimised by executing the function only once, it doesn't match any of the patterns that Postgres's query planner knows how to rewrite, so it uses a naive plan which runs the function multiple times.

    [Note: all tests performed on Postgres 9.1, because it's what I happened to have to hand.]