Search code examples
sqlpostgresqlwindow-functions

Postgres historical dependency for field with arbitrary LAG


I have a postgres table that looks like this:

 refnum | period | flag 
--------+--------+------
 ref01  | 201701 |    0
 ref01  | 201702 |    0
 ref01  | 201703 |    1
 ref01  | 201704 |    0
 ref01  | 201705 |    0
 ref01  | 201706 |    1
 ref01  | 201707 |    0

Where period is basically just a year-month timestamp as an int. I want to make it so that we add a column that maintains the last period where flag = 1. So it should look like this at the end:

 refnum | period | flag | lastPeriod
--------+--------+------+------------
 ref01  | 201701 |    0 | NULL
 ref01  | 201702 |    0 | NULL
 ref01  | 201703 |    1 | 201703
 ref01  | 201704 |    0 | 201703
 ref01  | 201705 |    0 | 201703
 ref01  | 201706 |    1 | 201706
 ref01  | 201707 |    0 | 201706

So that the lastPeriod column for every row is dependent upon the materialized value for that column in the row before. I tried doing this with windowing functions and LAG, but that really only works when you know the number of rows you want to look back to. It would be great if each row in the window was evaluated after the one before, but it seems as though they are run independently of the value before. What I basically want is something along the lines of:

SELECT CASE WHEN current_row.flag = 1 THEN current_row.period ELSE prev_row.lastPeriod

I figured out one way around this, but it involves essentially creating a temp table with all the periods where flag = 1, joining to that table, then pulling the max:

select refnum, period, max(backfill) FROM 
(
    select refnum, a.period as period, b.period as backfill
    FROM my_table a
    LEFT JOIN tmp_periods b ON a.period >= b.period
) as foo group by refnum, period order by period;

But I was hoping that we could do this in a better way, since the tables we are looking at are pretty huge.


Solution

  • You care looking for a conditional max:

    select t.*,
           max(case when flag = 1 then period end) over (partition by refnum order by period) as lastperiod
    from t;
    

    More recent versions of Postgres support filter:

    select t.*,
           max(period) filter (where flag = 1) over (partition by refnum order by period)
    from t;