Search code examples
sqlpostgresqlwindow-functions

Unexpected behavior of window function first_value


I have 2 columns - order no, value. Table value constructor:

(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)

I need to get

(1, 5) -- i.e. first nonnull Value if I go from current row and order by OrderNo
,(2, 5)
,(3, 2) -- i.e. first nonnull Value if I go from current row and order by OrderNo
,(4, 2) -- analogous
,(5, 2)
,(6, 1)

This is query that I think should work.

;with SourceTable as (
    select *
        from (values
            (1, null)
            ,(2, 5)
            ,(3, null)
            ,(4, null)
            ,(5, 2)
            ,(6, 1)
        ) as T(OrderNo, Value)
)
select
       *
       ,first_value(Value) over (
           order by
               case when Value is not null then 0 else 1 end
               , OrderNo
           rows between current row and unbounded following
       ) as X
   from SourceTable
order by OrderNo

The issue is that it returns exactly same resultset as SourceTable. I don't understand why. E.g., if first row is processed (OrderNo = 1) I'd expect column X returns 5 because frame should include all rows (current row and unbound following) and it orders by Value - nonnulls first, then by OrderNo. So first row in frame should be OrderNo=2. Obviously it doesn't work like that but I don't get why.

Much appreciated if someone explains how is constructed the first frame. I need this for SQL Server and also Postgresql.

Many thanks


Solution

  • It's pretty easy to see why first_value doesn't work if you order the results by case when Value is not null then 0 else 1 end, orderno

     orderno | value | x
    ---------+-------+---
           2 |     5 | 5
           5 |     2 | 2
           6 |     1 | 1
           1 |       |
           3 |       |
           4 |       |
    (6 rows)
    

    For orderno=1, there's nothing after it in the frame that would be not-null.

    Instead, we can arrange the orders into groups using count as a window function in a sub-query. We then use max as a window function over that group (this is arbitrary, min would work just as well) to get the one non-null value in that group:

    with SourceTable as (
        select *
            from (values
                (1, null)
                ,(2, 5)
                ,(3, null)
                ,(4, null)
                ,(5, 2)
                ,(6, 1)
            ) as T(OrderNo, Value)
    )
    select orderno, order_group, max(value) OVER (PARTITION BY order_group) FROM (
    
        SELECT *,
           count(value) OVER (ORDER BY orderno DESC) as order_group
       from SourceTable
       ) as sub
    order by orderno;
     orderno | order_group | max
    ---------+-------------+-----
           1 |           3 |   5
           2 |           3 |   5
           3 |           2 |   2
           4 |           2 |   2
           5 |           2 |   2
           6 |           1 |   1
    (6 rows)