Search code examples
sqlsql-servert-sqlwindow-functionsgaps-and-islands

Calculate Date Difference for Non-Consecutive Months


I am trying to find gaps in enrollment and have a table set up like this:

ID Enrollment _Month Consecutive_Months
1 202403 1
1 202404 2
1 202405 3
1 202409 1
1 202410 2
1 202411 3
2 202401 1
2 202402 2
2 202407 1
2 202408 2

Using ID 1 as an example, the goal is to find the difference between enrollment months between 202405 (max consecutive months number in sequence) and 202409, where the "Consecutive Months" count restarts back to 1 for each ID. Is there a way to do this?

Is there another way to calculate a gap like this without using the consecutive months column?

Thank you!!


Solution

  • As already suggested in the comments, it would be better to make a Enrollment_Month column of the date type. Then you can use the functions of working with the date (DATEDIFF(month,...).

    However, let's say you need it is an integer.
    First, we use lag() to get the previous value.
    Then calculate the difference in months between the two dates (YYYYMM). The rest of query is the typical gaps and islands task.

    If currMonth (as Enrollment_Month) and prevMonth (as lag(Enrollment_Month)) are integers in format YYYYMM, we calculate difference as

    (currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100)  dif
    

    If dif>1, there is Gap.

    See example

    with A as (
      select ID,Enrollment_month currMonth 
        ,lag(Enrollment_Month,1,Enrollment_Month)over(partition by ID order by Enrollment_Month) prevMonth
      from test
    )
    ,B as (
    select *
      ,(currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100)  dif
      ,sum(case when (currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100)>1 then 1 else 0 end)
           over(partition by ID order by currMonth) seqN
    from A
    )
    select *
      ,row_number()over(partition by ID,seqN order by currMonth) ConsecutiveMonths
    from B
    
    ID currMonth prevMonth dif seqN ConsecutiveMonths
    1 202403 202403 0 0 1
    1 202404 202403 1 0 2
    1 202405 202404 1 0 3
    1 202407 202405 2 1 1
    1 202409 202407 2 2 1
    1 202410 202409 1 2 2
    1 202411 202410 1 2 3
    1 202503 202411 4 3 1
    1 202504 202503 1 3 2
    2 202401 202401 0 0 1
    2 202402 202401 1 0 2
    2 202407 202402 5 1 1
    2 202408 202407 1 1 2
    7 202410 202410 0 0 1
    7 202411 202410 1 0 2
    7 202412 202411 1 0 3
    7 202501 202412 1 0 4
    8 202410 202410 0 0 1
    8 202501 202410 3 1 1
    8 202502 202501 1 1 2
    9 202412 202412 0 0 1
    9 202501 202412 1 0 2

    Shortly, the same

    select *
      ,row_number()over(partition by ID,seqN order by currMonth) ConsecutiveMonths
    from (
      select *
        ,(currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100)  dif
        ,sum(case when (currMonth/100-prevMonth/100)*12+(currMonth%100-prevMonth%100)>1 then 1 else 0 end)
           over(partition by ID order by currMonth) seqN
      from (
      select ID,Enrollment_month currMonth 
          ,lag(Enrollment_Month,1,Enrollment_Month)
                over(partition by ID order by Enrollment_Month) prevMonth
        from test
      )A
    )B
    

    We do not use column Consecutive_Months in this queries.

    fiddle