Search code examples
sqllag

why the Last value of LAG function does not yield any value?


From my understanding, I can use LAG() function to access the value of previous row. I am creating new columns 'survived' and 'disenrolled' based on the value of status from previous row. Month '2022-11' has status 1. Since it's not NULL I would expect the survived value for row with month '2022-12' to be 1 and disnerolled to be 0. But the results table have those values as NAs. What is happening here? Why are those values not showing?

select * from tmp_enrollment_long_2;

person_id month status
1234 2021-12 1
1234 2022-01 1
1234 2022-02 1
1234 2022-03 1
1234 2022-04 1
1234 2022-05 1
1234 2022-06 1
1234 2022-07 1
1234 2022-08 1
1234 2022-09 1
1234 2022-10 1
1234 2022-11 1
1234 2022-12 1
SELECT * 
,CAST( lag( status IS NOT NULL, 1 ) OVER( partition BY person_id ORDER BY month DESC ) AS SMALLINT ) AS survived
,CAST( lag( status IS     NULL, 1 ) OVER( partition BY person_id ORDER BY month DESC ) AS SMALLINT ) AS disenrolled
FROM tmp_enrollment_long_2;
person_id month status survived disenrolled
1234 2021-12 1 1 0
1234 2022-01 1 1 0
1234 2022-02 1 1 0
1234 2022-03 1 1 0
1234 2022-04 1 1 0
1234 2022-05 1 1 0
1234 2022-06 1 1 0
1234 2022-07 1 1 0
1234 2022-08 1 1 0
1234 2022-09 1 1 0
1234 2022-10 1 1 0
1234 2022-11 1 1 0
1234 2022-12 1

Solution

  • "value of status from previous row" using window of "ORDER BY month DESC" actually means "value from next month". In your data 2022-12 is last month, available, there is no next month - therefore the values are null.

    If you need a non-null value you can use coalesce function to get some reasonable default:

    coalesce(lag( status IS NOT NULL, 1 ), 1)