I am trying to find gaps in enrollment and have a table set up like this:
ID | Enrollment _Month | Consecutive_Months |
---|---|---|
1 | 202403 | 1 |
1 | 202404 | 2 |
1 | 202405 | 3 |
1 | 202409 | 1 |
1 | 202410 | 2 |
1 | 202411 | 3 |
2 | 202401 | 1 |
2 | 202402 | 2 |
2 | 202407 | 1 |
2 | 202408 | 2 |
Using ID 1 as an example, the goal is to find the difference between enrollment months between 202405 (max consecutive months number in sequence) and 202409, where the "Consecutive Months" count restarts back to 1 for each ID. Is there a way to do this?
Is there another way to calculate a gap like this without using the consecutive months column?
Thank you!!
As already suggested in the comments, it would be better to make a Enrollment_Month column of the date type. Then you can use the functions of working with the date (DATEDIFF(month,...).
However, let's say you need it is an integer
.
First, we use lag() to get the previous value.
Then calculate the difference in months between the two dates (YYYYMM). The rest of query is the typical gaps and islands
task.
If currMonth (as Enrollment_Month) and prevMonth (as lag(Enrollment_Month)) are integers in format YYYYMM, we calculate difference as
(currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100) dif
If dif>1, there is Gap.
See example
with A as (
select ID,Enrollment_month currMonth
,lag(Enrollment_Month,1,Enrollment_Month)over(partition by ID order by Enrollment_Month) prevMonth
from test
)
,B as (
select *
,(currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100) dif
,sum(case when (currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100)>1 then 1 else 0 end)
over(partition by ID order by currMonth) seqN
from A
)
select *
,row_number()over(partition by ID,seqN order by currMonth) ConsecutiveMonths
from B
ID | currMonth | prevMonth | dif | seqN | ConsecutiveMonths |
---|---|---|---|---|---|
1 | 202403 | 202403 | 0 | 0 | 1 |
1 | 202404 | 202403 | 1 | 0 | 2 |
1 | 202405 | 202404 | 1 | 0 | 3 |
1 | 202407 | 202405 | 2 | 1 | 1 |
1 | 202409 | 202407 | 2 | 2 | 1 |
1 | 202410 | 202409 | 1 | 2 | 2 |
1 | 202411 | 202410 | 1 | 2 | 3 |
1 | 202503 | 202411 | 4 | 3 | 1 |
1 | 202504 | 202503 | 1 | 3 | 2 |
2 | 202401 | 202401 | 0 | 0 | 1 |
2 | 202402 | 202401 | 1 | 0 | 2 |
2 | 202407 | 202402 | 5 | 1 | 1 |
2 | 202408 | 202407 | 1 | 1 | 2 |
7 | 202410 | 202410 | 0 | 0 | 1 |
7 | 202411 | 202410 | 1 | 0 | 2 |
7 | 202412 | 202411 | 1 | 0 | 3 |
7 | 202501 | 202412 | 1 | 0 | 4 |
8 | 202410 | 202410 | 0 | 0 | 1 |
8 | 202501 | 202410 | 3 | 1 | 1 |
8 | 202502 | 202501 | 1 | 1 | 2 |
9 | 202412 | 202412 | 0 | 0 | 1 |
9 | 202501 | 202412 | 1 | 0 | 2 |
Shortly, the same
select *
,row_number()over(partition by ID,seqN order by currMonth) ConsecutiveMonths
from (
select *
,(currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100) dif
,sum(case when (currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100)>1 then 1 else 0 end)
over(partition by ID order by currMonth) seqN
from (
select ID,Enrollment_month currMonth
,lag(Enrollment_Month,1,Enrollment_Month)
over(partition by ID order by Enrollment_Month) prevMonth
from test
)A
)B
We do not use column Consecutive_Months
in this queries.