query-optimization postgresql-9.3 window-functions sql-execution-plan

Bad optimization/planning on Postgres window-based queries (partition by(, group by?)) - 1000x speedup

We are running Postgres 9.3.5. (07/2014) We have quite some complex datawarehouse/reporting setup in place (ETL, materialized views, indexing, aggregations, analytical functions, ...).

What I discovered right now may be difficult to implement in the optimizer (?), but it makes a huge difference in performance (only sample code with huge similarity to our query to reduce unnecessary complexity):

create view foo as
select
  sum(s.plan) over w_pyl as pyl_plan,      -- money planned to spend in this pot/loc/year
  sum(s.booked) over w_pyl as pyl_booked,  -- money already booked in this pot/loc/year

  -- money already booked in this pot/loc the years before (stored as sum already)
  last_value(s.booked_prev_years) over w_pl as pl_booked_prev_years,    

  -- update 2014-10-08: maybe the following additional selected columns
  -- may be implementation-/test-relevant since they could potentially be determined
  -- by sorting within the partition:
  min(s.id) over w_pyl,
  max(s.id) over w_pyl,

  -- ... anything could follow here ...
  x.*,
  s.*
from
  pot_location_year x  -- may be some materialized view or (cache/regular) table
  left outer join  spendings s 
    on (s.pot = x.pot and s.loc = x.loc and s.year = x.year)
window
  w_pyl  as (partition by  x.pot, x.year, x.loc)
  w_pl   as (partition by  x.pot, x.loc  order by x.year)

We have these two relevant indexes in place:

pot_location_year_idx__p_y_l  -- on pot, year, loc
pot_location_year_idx__p_l_y  -- on pot, loc, year

Now we run an explain for some test query

explain select * from foo fetch first 100 rows only

This shows us some very bad performance, because the pyl index is used, where the result set has to be unnecessarily sorted twice :-( (the outmost WindowAgg/Sort step sorts ply because this is necessary for our last_value(..) as pl_booked_prev_years):

 Limit  (cost=289687.87..289692.12 rows=100 width=512)
   ->  WindowAgg  (cost=289687.87..292714.85 rows=93138 width=408)
         ->  Sort  (cost=289687.87..289920.71 rows=93138 width=408)
               Sort Key: x.pot, x.loc, x.year
               ->  WindowAgg  (cost=1.25..282000.68 rows=93138 width=408)
                     ->  Nested Loop Left Join  (cost=1.25..278508.01 rows=93138 width=408)
                           Join Filter: ...
                           ->  Nested Loop Left Join  (cost=0.83..214569.60 rows=93138 width=392)
                                 ->  Index Scan using pot_location_year_idx__p_y_l on pot_location_year x  (cost=0.42..11665.49 rows=93138 width=306)
                                 ->  Index Scan using ...  (cost=0.41..2.17 rows=1 width=140)
                                       Index Cond: ...
                           ->  Index Scan using ...  (cost=0.41..0.67 rows=1 width=126)
                                 Index Cond: ...

So the obvious problem is, that the planner should choose the existing ply index instead, to not have to sort twice.

Solution

Luckily I found out that I could give the planner an (implicit) hint to do this by making sure that the column order of the other view partitions/windows is more homogenous although not semantically necessary.

The following change now returned what I had expected to get in the first place (the usage of the ply index):

...
window
  -- w_pyl  as (partition by  x.pot, x.year, x.loc)  -- showstopper (from above)
     w_pyl  as (partition by  x.pot, x.loc, x.year)  -- speedy
     w_pl   as (partition by  x.pot, x.loc  order by x.year)

The 1000 times faster performing result:

 Limit  (cost=1.25..308.02 rows=100 width=512)
   ->  WindowAgg  (cost=1.25..284794.82 rows=93138 width=408)
         ->  WindowAgg  (cost=1.25..282000.68 rows=93138 width=408)
               ->  Nested Loop Left Join  (cost=1.25..278508.01 rows=93138 width=408)
                     Join Filter: ...
                     ->  Nested Loop Left Join  (cost=0.83..214569.60 rows=93138 width=392)
                           ->  Index Scan using pot_location_year_idx__p_l_y on pot_location_year x  (cost=0.42..11665.49 rows=93138 width=306)
                           ->  Index Scan using ...  (cost=0.41..2.17 rows=1 width=140)
                                 Index Cond: ...
                     ->  Index Scan using ...  (cost=0.41..0.67 rows=1 width=126)
                           Index Cond: ...

Update 2014-10-09:

Tom Lane-2 wrote this (one of the major postgres developers) related to another (likely related) window function problem I am facing here as well on 2013-02 related to pg 9.2.2:

... There's not nearly that amount of intelligence in the system about window functions, as yet. So you'll have to write out the query longhand and put the WHERE clause at the lower level, if you want this optimization to happen.

So some more (debatable) general thoughts on the subject of window functions, data warehouse functionality etc. that could be considered here:

The above is a good statement to strengthen my assumption, when it was decided to do some Oracle->Postgres migration in general projects and in a DWH environment, that the risk of spending much more time and money doing so would be quite high. (Although the investigated functionality may seem sufficient.)

I like Postgres in important areas much more than Oracle, looking e.g. at the syntax and clarity of the code and other things (I guess even the source code and thus maintainability (in all its aspects) is much better there), but Oracle is clearly the much more advanced player in the resource optimization, support and tooling areas, when you are dealing with more complex db functionality outside the typical CRUD management.

I guess the open source Postgres (as well as the EnterpriseDB topups) will catch up in the long run in those areas, but it will take them at least 10 years, and maybe only if it is pushed heavily by big, altruistic¹ global players like Google etc.)

¹ altruistic in the sense, that if the pushed areas stay "free", the benefit for those companies must be surely somewhere else (maybe with some advertisement rows added randomly - I guess we could live with it here and there ;))

Update 2014-10-13:

As linked in my previous update above (2014-10-09), the optimization problems and their workaround solutions go on in a quite similiar way (after the above fix), when you want to query the above view with constraints/filters (here on pot_id):

explain select * foo where pot_id = '12345' fetch first 100 rows only

...

 Limit  (cost=1.25..121151.44 rows=100 width=211)
   ->  Subquery Scan on foo  (cost=1.25..279858.20 rows=231 width=211)
         Filter: ((foo.pot_id)::text = '12345'::text)
         ->  WindowAgg  (cost=1.25..277320.53 rows=203013 width=107)
               ->  WindowAgg  (cost=1.25..271230.14 rows=203013 width=107)
                     ->  Nested Loop Left Join  (cost=1.25..263617.16 rows=203013 width=107)
                           ->  Merge Left Join  (cost=0.83..35629.02 rows=203013 width=91)
                                 Merge Cond: ...
                                 ->  Index Scan using pot_location_year_idx__p_l_y on pot_location_year x  (cost=0.42..15493.80 rows=93138 width=65)
                                 ->  Materialize  (cost=0.41..15459.42 rows=33198 width=46)
                                       ->  Index Scan using ...  (cost=0.41..15376.43 rows=33198 width=46)
                           ->  Index Scan using ...  (cost=0.42..1.11 rows=1 width=46)
                                 Index Cond: ...

And as suggested in the above link, if you want to "push down" the contraint/filter before the window aggregation, you have to do it explicitly in the view itself already, which will be efficient for this type of query then with another 1000 times speedup for the 100th row:

 create view foo as
 ...
 where pot_id='12345'
 ...

...

 Limit  (cost=1.25..943.47 rows=100 width=211)
   ->  WindowAgg  (cost=1.25..9780.52 rows=1039 width=107)
         ->  WindowAgg  (cost=1.25..9751.95 rows=1039 width=107)
               ->  Nested Loop Left Join  (cost=1.25..9715.58 rows=1039 width=107)
                     ->  Nested Loop Left Join  (cost=0.83..1129.47 rows=1039 width=91)
                           ->  Index Scan using pot_location_year_idx__p_l_y on pot_location_year x (cost=0.42..269.77 rows=106 width=65)
                                 Index Cond: ((pot_id)::text = '12345'::text)
                           ->  Index Scan using ...  (cost=0.41..8.10 rows=1 width=46)
                                 Index Cond: ...
                     ->  Index Scan using ...  (cost=0.42..8.25 rows=1 width=46)
                           Index Cond: ...

After some more view parameterization effort² this approach will help speedup certain queries constraining those columns, but is still quite inflexible regarding a more general foo-view usage and query optimization.

²: You can "parameterize such a view" putting it (its SQL) in a (set-returning) table function (the Oracle equivalent to a pipelined table function). Further details regarding this may be found in the forum link above.