Search code examples
sqlsql-serversql-server-2000gaps-and-islands

Find consecutive working dates for each employee


In SQL 2000 I have a table that contains the following:

ID Date WorkingTime EmployeeID

For August, this table would contain 200 employees with dates of 8/1 - 8/31. I need to find out what is the MIN date of the first 5 consecutive days of working time for each employee starting at the day passed in and going backward.

For Example: If employee 123 looked as follows and 8/10/2013 was passed in:

ID Date WorkingTime EmployeeID
1  8/1      1           123 
2  8/2      0           123
3  8/3      0           123
4  8/4      1           123
5  8/5      1           123
6  8/6      1           123
7  8/7      1           123
8  8/8      1           123
9  8/9      0           123
10 8/10     1           123

The result would be 8/4. This needs to be done all at once for all of the employees in the table, so they would all have different min dates, all starting on 8/10 since that was the date that was passed into the query. This table is very large in real life and contins many dates and employees, not just in Auguest. I thought about using a cursor to go through all of this but I think that would be really slow. I was also thinking of adding all of the working times to a temp table and doing a datediff on them to find the consecutive 5 with a datediff of 1, but I wasn't quite sure how to execute that. Is there a better way I am not thinking of?


Solution

  • DECLARE @MyTable TABLE
    (
        ID      INT IDENTITY PRIMARY KEY,
        [Date]  SMALLDATETIME NOT NULL,
        WorkingTime INT NOT NULL,
        EmployeeID  INT NOT NULL
    );
    INSERT  @MyTable ([Date], WorkingTime, EmployeeID)
    -- First employee
    SELECT  '20130801', 1, 123 UNION ALL 
    SELECT  '20130802', 0, 123 UNION ALL 
    SELECT  '20130803', 0, 123 UNION ALL 
    SELECT  '20130804', 1, 123 UNION ALL 
    SELECT  '20130805', 1, 123 UNION ALL 
    SELECT  '20130806', 1, 123 UNION ALL 
    SELECT  '20130807', 1, 123 UNION ALL 
    SELECT  '20130808', 1, 123 UNION ALL 
    SELECT  '20130809', 0, 123 UNION ALL 
    SELECT  '20130810', 1, 123 UNION ALL
    -- Second employee
    SELECT  '20130801', 1, 126 UNION ALL 
    SELECT  '20130802', 1, 126 UNION ALL 
    SELECT  '20130803', 1, 126 UNION ALL 
    SELECT  '20130804', 1, 126 UNION ALL 
    SELECT  '20130805', 1, 126 UNION ALL 
    SELECT  '20130806', 0, 126 UNION ALL 
    -- Third employee
    SELECT  '20130801', 0, 127 UNION ALL 
    SELECT  '20130802', 0, 127 UNION ALL 
    SELECT  '20130803', 1, 127 UNION ALL 
    SELECT  '20130804', 1, 127 UNION ALL 
    SELECT  '20130805', 0, 127 UNION ALL 
    SELECT  '20130806', 0, 127; 
    

    --

    DECLARE @Results TABLE
    (
        EmployeeID  INT NOT NULL,
        DaysDiff    INT NOT NULL,
        PRIMARY KEY(EmployeeID, DaysDiff), -- This is a "clustered index"/index organized table
        RowNum      INT IDENTITY NOT NULL,
        [Date]      SMALLDATETIME NOT NULL
    );
    INSERT  @Results (EmployeeID, DaysDiff, [Date])
    SELECT  x.EmployeeID,   
            DATEDIFF(DAY, 0, x.[Date]) AS DaysDiff,
            x.[Date]
    FROM    @MyTable x
    WHERE   x.WorkingTime = 1
    /*
    This ORDER BY clause and the clustered index (PRIMARY KEY(EmployeeID, DaysDiff))
    should give a hint to SQL Server so that 
    RowNum IDENTITY values will be generated in this order: EmployeeID, DaysDiff
    
    Note #1: There is not 100% guarantee that insert order will be the same as 
    ORDER BY x.EmployeeID, DaysDiff
    and 
    clustered index key (EmployeeID, DaysDiff)
    
    Note #2: This INSERT INTO table with identity column simulates the ROW_NUMBER function
    which is available starting with SQL2005.
    */
    ORDER BY x.EmployeeID, DaysDiff
    OPTION (MAXDOP 1); -- It minimizes the risk of messing up the order of RowNum
    
    SELECT  y.EmployeeID, MAX(y.GroupStartDate) AS FirstGroupStartDate
    FROM
    (
        SELECT  x.EmployeeID, x.GroupID, 
                MIN(x.[Date]) AS GroupStartDate, MAX(x.[Date]) AS GroupEndDate,
                DATEDIFF(DAY, MIN(x.[Date]), MAX(x.[Date]))+1 AS ContinuousDays
        FROM
        (
            SELECT  *, r.DaysDiff - r.RowNum AS GroupID
            FROM    @Results r
        ) x
        GROUP BY x.EmployeeID, x.GroupID
    ) y
    WHERE y.ContinuousDays > 4
    GROUP BY y.EmployeeID;