Search code examples
sqlpostgresqlwindow-functionsmedian

Filling in missing values with a median in postgres


How can I replace avg with a median calculation in this?

select *
, coalesce(val, avg(val) over (order by t rows between 3 preceding and 1 preceding)) as fixed
from (
    values
    (1, 10),
    (2, NULL),
    (3, 10),
    (4, 15),
    (5, 11),
    (6, NULL),
    (7, NULL),
    (8, NULL),
    (9, NULL)
) as test(t, val)
;

Is there a legal version of this?

percentile_cont(0.5) within group(order by val) over (order by t rows between 3 preceding and 1 preceding)

Solution

  • Unfortunately percentile_cont() is an aggregate function, for which there is no equivalent window function.

    One workaround is to use an inline subquery to do the aggregate computation.

    If ids are always increasing, then you can do:

    select 
        t.*,
        coalesce(
            t.val, 
            (
                select percentile_cont(0.5) within group(order by t1.val)
                from test t1
                where t1.id between t.id - 3 and t.id - 1
            )
        ) fixed
    from test t
    

    Otherwise, you need an additional level of nesting:

    select 
        t.*,
        coalesce(
            t.val, 
            (
                select percentile_cont(0.5) within group(order by t1.val)
                from (select val from test t1 where t1.id < t.id order by t1.id desc limit 3) t1
            )
        ) fixed
    from test t
    

    Demo on DB Fiddle - both queries yield:

    id |  val | fixed
    -: | ---: | :----
     1 |   10 | 10   
     2 | null | 10   
     3 |   10 | 10   
     4 |   15 | 15   
     5 |   11 | 11   
     6 | null | 11   
     7 | null | 13   
     8 | null | 11   
     9 | null | null