From my understanding, I can use LAG() function to access the value of previous row. I am creating new columns 'survived' and 'disenrolled' based on the value of status from previous row. Month '2022-11' has status 1. Since it's not NULL I would expect the survived value for row with month '2022-12' to be 1 and disnerolled to be 0. But the results table have those values as NAs. What is happening here? Why are those values not showing?
select * from tmp_enrollment_long_2;
person_id | month | status |
---|---|---|
1234 | 2021-12 | 1 |
1234 | 2022-01 | 1 |
1234 | 2022-02 | 1 |
1234 | 2022-03 | 1 |
1234 | 2022-04 | 1 |
1234 | 2022-05 | 1 |
1234 | 2022-06 | 1 |
1234 | 2022-07 | 1 |
1234 | 2022-08 | 1 |
1234 | 2022-09 | 1 |
1234 | 2022-10 | 1 |
1234 | 2022-11 | 1 |
1234 | 2022-12 | 1 |
SELECT *
,CAST( lag( status IS NOT NULL, 1 ) OVER( partition BY person_id ORDER BY month DESC ) AS SMALLINT ) AS survived
,CAST( lag( status IS NULL, 1 ) OVER( partition BY person_id ORDER BY month DESC ) AS SMALLINT ) AS disenrolled
FROM tmp_enrollment_long_2;
person_id | month | status | survived | disenrolled |
---|---|---|---|---|
1234 | 2021-12 | 1 | 1 | 0 |
1234 | 2022-01 | 1 | 1 | 0 |
1234 | 2022-02 | 1 | 1 | 0 |
1234 | 2022-03 | 1 | 1 | 0 |
1234 | 2022-04 | 1 | 1 | 0 |
1234 | 2022-05 | 1 | 1 | 0 |
1234 | 2022-06 | 1 | 1 | 0 |
1234 | 2022-07 | 1 | 1 | 0 |
1234 | 2022-08 | 1 | 1 | 0 |
1234 | 2022-09 | 1 | 1 | 0 |
1234 | 2022-10 | 1 | 1 | 0 |
1234 | 2022-11 | 1 | 1 | 0 |
1234 | 2022-12 | 1 |
"value of status from previous row" using window of "ORDER BY month DESC" actually means "value from next month". In your data 2022-12 is last month, available, there is no next month - therefore the values are null.
If you need a non-null value you can use coalesce
function to get some reasonable default:
coalesce(lag( status IS NOT NULL, 1 ), 1)