Search code examples
sqlpostgresqlnullgaps-and-islands

How to replace null values with the last preceding not-null value?


I have a table orders in Postgres with columns offer_id and date. I need to write a query which fills in all null cases in column offer_id, using the previous not-null value.

I tried to use the window function lag() but it only manages with one single NULL, not several. How to fix that?

image here

I tried this:

with orders as (
                  select 2 as offer_id, '2021-01-01'::date as date union all
                  select 3 as offer_id, '2021-01-02'::date as date union all
                  select null as offer_id, '2021-01-03'::date as date union all
                  select null as offer_id, '2021-01-04'::date as date union all
                  select null as offer_id, '2021-01-05'::date as date union all
                  select 4 as offer_id, '2021-01-07'::date as date union all
                  select 5 as offer_id, '2021-01-08'::date as date union all
                  select null as offer_id, '2021-01-09'::date as date union all
                  select 8 as offer_id, '2021-01-10'::date as date union all
                  select 9 as offer_id, '2021-01-11'::date as date union all
                  select null as offer_id, '2021-01-12'::date as date union all
                  select null as offer_id, '2021-01-13'::date as date union all
                  select 13 as offer_id, '2021-01-14'::date as date union all
                  select 13 as offer_id, '2021-01-15'::date as date union all
                  select null as offer_id, '2021-01-16'::date as date 
                  )
SELECT *, CASE WHEN offer_id IS NULL 
THEN LAG(offer_id) OVER (ORDER BY date) ELSE offer_id END AS updated_offer_id 
FROM orders

Solution

  • Can be solved with a subquery forming groups:

    SELECT offer_id, date
         , first_value(offer_id) OVER (PARTITION BY grp ORDER BY date) AS updated_offer_id
    FROM  (
       SELECT *, count(offer_id) OVER (ORDER BY date) AS grp
       FROM   orders
       ) sub;
    

    fiddle

    Since count() only counts not-null values, all rows with null fall into the one group with the preceding not-null value.
    In the outer SELECT, just pick the value of the first group member for all.

    Notably, this fills in the last preceding not-null value, not the greatest.

    Leading null values stay null for lack of a leading template value. Related: