Search code examples
sqlsql-servert-sqlsql-server-2012-express

How to Determine consecutive date count/days in SQL Server "finding islands" (consecutive rows)


I have a student table and I would like to know How long their session/training has lasted. I would like to exclude weekends, but I want to count continuous days excluding weekends. A class has Start Date and End Date, for example, student ID S1 a can book a class on Jan then again on Feb and I would like to know how many days for Jan bookings and February while excluding weekends. Basically, I am looking for continuous dates from Start Date to End date by student id where there no break except for the weekend.

SELECT 
 [ID]
,[StartDate]
,[EndDate]
,[BookingDays] AS Consecutive_Booking
FROM StudentBooking

If the student (student classifications(Type)) has book the class for 5 days or 2 times (Start Date to End Date (Monday to Friday)) in the last 3 months they are Resident else Visitors. Start Date and End date is recorded as of Monday - Friday only. Please note student ID 1 has a continuous date and this should be counted as a block. (02/01/2018-12/01/2018) second block 22/01-26/01

I would like to replicate below table.

ID   StartDate  EndDate     Duration     Type
1   02/01/2018  05/01/2018              ==>Please Note have continous dates
1   08/01/2018  12/01/2018   9           Resident
1   22/01/2018  26/01/2018   5           Resident 
2   23/01/2018  26/01/2018   4           Visitor
3   29/01/2018  31/01/2018   3           Visitor

Solution

  • Here's my solution to your problem.

    In the CTE "comparison", I join every record with this and all following records for that student. This way, I have a possible starting point (from the left side of the join) of a continuous training block and a possible end of such a block (from the right side of the join). Using "cross applies", I calculate 2 values:

    1. the workdays from the start of the possible first to the end of the possible last interval of a chain
    2. the workdays in just the possible last interval in the chain.

    On the latter values, using a windows function, I build a running total of the workdays from the possible start and end intervals. You tagged the question with "SQL 2012", so using this window function should be possible.

    In the next CTE ("sorting"), I restict the previous results to those where the running total equals the workdays between first start date and last end date. This way, only continuous blocks are left. These are then numbered in 2 ways:

    1. continuous blocks with the same EndDate are numberd by ascending StartDate
    2. continuous blocks with the same StartDate are numberd by descending EndDate.

    For every EndDate, I want the earliest StartDate, and for this StartDate, I want the latest EndDate only, so I filter for 1 in both numberings. Here it is:

    WITH
      comparison (ID, StartDate, EndDate, TotalDays, SumSingleDays) AS (
        SELECT bStart.ID, bStart.StartDate, bEnd.EndDate, Workdays.Total
          , SUM(Workdays.Single) OVER (
              PARTITION BY bStart.ID, bStart.StartDate 
              ORDER BY bEnd.StartDate
              ROWS UNBOUNDED PRECEDING)
        FROM StudentBookings bStart
          INNER JOIN StudentBookings bEnd 
            ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
          CROSS APPLY (VALUES (
            DATEDIFF(day, 0, bStart.StartDate), 
            DATEDIFF(day, 0, bEnd.StartDate), 
            1+DATEDIFF(day, 0, bEnd.EndDate))
          ) d (s1, s2, e2)
          CROSS APPLY (VALUES (
            (d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
            (d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
          ) Workdays (Total, Single)
      ),
      sorting (ID, StartDate, EndDate, Duration, RowNumStart, RowNumEnd) AS (
        SELECT ID, StartDate, EndDate, TotalDays
          , ROW_NUMBER() OVER (PARTITION BY ID, EndDate ORDER BY StartDate)
          , ROW_NUMBER() OVER (PARTITION BY ID, StartDate ORDER BY EndDate DESC)
        FROM comparison
        WHERE TotalDays = SumSingleDays
      )
    SELECT ID, StartDate, EndDate, Duration
      , CASE WHEN Duration >= 5 THEN 'Resident' ELSE 'Visitor' END AS [Type]
    FROM sorting 
    WHERE (RowNumStart = 1) 
      AND (RowNumEnd = 1)
    ORDER BY ID, StartDate;
    

    The results:

    enter image description here

    Maybe there's a more elegant way to solve this using the interval packing solution by Itzik Ben-Gan, I'll post it when I figured that out.

    Added:

    Additionally, I count the number of bookings of all booking blocks and build a sum by student (ID) to make the "Resident" decision in the end. Bookings are restricted to the last 3 months in the first CTE (comparison):

    WITH
      comparison (ID, StartDate, EndDate, TotalDays, CountBookings, SumSingleDays) AS (
        SELECT bStart.ID, bStart.StartDate, bEnd.EndDate, Workdays.Total
          , COUNT(Workdays.Single) OVER (
              PARTITION BY bStart.ID, bStart.StartDate 
              ORDER BY bEnd.StartDate
              ROWS UNBOUNDED PRECEDING)
          , SUM(Workdays.Single) OVER (
              PARTITION BY bStart.ID, bStart.StartDate 
              ORDER BY bEnd.StartDate
              ROWS UNBOUNDED PRECEDING)
        FROM StudentBookings bStart
          INNER JOIN StudentBookings bEnd 
            ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
          CROSS APPLY (VALUES (
            DATEDIFF(day, 0, bStart.StartDate), 
            DATEDIFF(day, 0, bEnd.StartDate), 
            1+DATEDIFF(day, 0, bEnd.EndDate))
          ) d (s1, s2, e2)
          CROSS APPLY (VALUES (
            (d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
            (d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
          ) Workdays (Total, Single)
        WHERE bStart.StartDate >= DATEADD(month, -3, GETDATE())
      ),
      sorting (ID, StartDate, EndDate, Duration, CountBookings, RowNumStart, RowNumEnd) AS (
        SELECT ID, StartDate, EndDate, TotalDays, CountBookings
          , ROW_NUMBER() OVER (PARTITION BY ID, EndDate ORDER BY StartDate)
          , ROW_NUMBER() OVER (PARTITION BY ID, StartDate ORDER BY EndDate DESC)
        FROM comparison
        WHERE TotalDays = SumSingleDays
      ),
     counting (ID, StartDate, EndDate, Duration, Bookings) AS (
      SELECT ID, StartDate, EndDate, Duration
        , SUM(CountBookings) OVER (PARTITION BY ID)
      FROM sorting WHERE (RowNumStart = 1) AND (RowNumEnd = 1)
    )
    SELECT ID, StartDate, EndDate, Duration, Bookings
      , CASE 
          WHEN Duration >= 5 OR Bookings >= 2 THEN 'Resident' ELSE 'Visitor'
        END AS [Type]
    FROM counting
    ORDER BY ID, StartDate;
    

    Filtering ClasseReferences:

    The ClassReference will be taken and filtered from the bStart table reference. To be able to add this field to the final query, it also has to be used to join the bEndtable reference, so only booking intervals with the same ClassReference value will be connected to blocks:

    WITH
      comparison (ID, ClassReference, StartDate, EndDate, TotalDays, CountBookings, SumSingleDays) AS (
        SELECT bStart.ID, bStart.ClassReference, bStart.StartDate, bEnd.EndDate, Workdays.Total
          , COUNT(Workdays.Single) OVER (
              PARTITION BY bStart.ID, bStart.StartDate 
              ORDER BY bEnd.StartDate
              ROWS UNBOUNDED PRECEDING)
          , SUM(Workdays.Single) OVER (
              PARTITION BY bStart.ID, bStart.StartDate 
              ORDER BY bEnd.StartDate
              ROWS UNBOUNDED PRECEDING)
        FROM StudentBookings bStart
          INNER JOIN StudentBookings bEnd 
            ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
           AND bStart.ClassReference = bEnd.ClassReference
          CROSS APPLY (VALUES (
            DATEDIFF(day, 0, bStart.StartDate), 
            DATEDIFF(day, 0, bEnd.StartDate), 
            1+DATEDIFF(day, 0, bEnd.EndDate))
          ) d (s1, s2, e2)
          CROSS APPLY (VALUES (
            (d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
            (d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
          ) Workdays (Total, Single)
        WHERE bStart.StartDate >= DATEADD(month, -3, GETDATE())
          AND bStart.ClassReference IN (N'C1', N'C2')
      ),
      sorting (ID, ClassReference, StartDate, EndDate, Duration, CountBookings, RowNumStart, RowNumEnd) AS (
        SELECT ID, ClassReference, StartDate, EndDate, TotalDays, CountBookings
          , ROW_NUMBER() OVER (PARTITION BY ID, ClassReference, EndDate ORDER BY StartDate)
          , ROW_NUMBER() OVER (PARTITION BY ID, ClassReference, StartDate ORDER BY EndDate DESC)
        FROM comparison
        WHERE TotalDays = SumSingleDays
      ),
      counting (ID, ClassReference, StartDate, EndDate, Duration, Bookings) AS (
        SELECT ID, ClassReference, StartDate, EndDate, Duration
          , SUM(CountBookings) OVER (PARTITION BY ID, ClassReference)
        FROM sorting WHERE (RowNumStart = 1) AND (RowNumEnd = 1)
      )
    SELECT ID, ClassReference, StartDate, EndDate, Duration, Bookings
      , CASE 
          WHEN Duration >= 5 OR Bookings >= 2 THEN 'Resident' ELSE 'Visitor'
        END AS [Type]
    FROM counting
    ORDER BY ID, StartDate;
    

    Using this data for testing:

    Test Data

    With a filter for the last 12 months, the query returns:

    Result with ClassReference

    So Student 1 is "Resident" in class C2 but Visitor in Class C1.