I am doing some roster analysis and need to identify when an employee has worked for 5 or more consecutive days. In my table, I can extract data something like the below (note, there are lot more columns, this is just a cut down example):
Emp | Start | First_Entry |
---|---|---|
1234 | 23/06/2016 | 1 |
1234 | 24/06/2016 | 1 |
1234 | 24/06/2016 | 0 |
1234 | 25/06/2016 | 1 |
1234 | 26/06/2016 | 1 |
1234 | 27/06/2016 | 1 |
1234 | 28/06/2016 | 1 |
1234 | 29/06/2016 | 1 |
1234 | 29/06/2016 | 0 |
1234 | 30/06/2016 | 1 |
1234 | 2/07/2016 | 1 |
1234 | 3/07/2016 | 1 |
1234 | 3/07/2016 | 0 |
1234 | 4/07/2016 | 1 |
1234 | 4/07/2016 | 0 |
1234 | 5/07/2016 | 1 |
1234 | 6/07/2016 | 1 |
1234 | 9/07/2016 | 1 |
1234 | 10/07/2016 | 1 |
1234 | 11/07/2016 | 1 |
1234 | 12/07/2016 | 1 |
And what I am after is something like this:
Emp | Start | First_Entry | Consecutive_Days | Over_5 | Status |
---|---|---|---|---|---|
1234 | 23/06/2016 | 1 | 1 | 0 | Worked < 5 |
1234 | 24/06/2016 | 1 | 2 | 0 | Worked < 5 |
1234 | 24/06/2016 | 0 | 2 | 0 | Worked < 5 |
1234 | 25/06/2016 | 1 | 3 | 0 | Worked < 5 |
1234 | 26/06/2016 | 1 | 4 | 0 | Worked < 5 |
1234 | 27/06/2016 | 1 | 5 | 1 | Worked >= 5 |
1234 | 28/06/2016 | 1 | 6 | 1 | Worked >= 5 |
1234 | 29/06/2016 | 1 | 7 | 1 | Worked >= 5 |
1234 | 29/06/2016 | 0 | 7 | 1 | Worked >= 5 |
1234 | 30/06/2016 | 1 | 8 | 1 | Worked >= 5 |
1234 | 02/07/2016 | 1 | 1 | 0 | Worked < 5 |
1234 | 03/07/2016 | 1 | 2 | 0 | Worked < 5 |
1234 | 03/07/2016 | 0 | 2 | 0 | Worked < 5 |
1234 | 04/07/2016 | 1 | 3 | 0 | Worked < 5 |
1234 | 04/07/2016 | 0 | 3 | 0 | Worked < 5 |
1234 | 05/07/2016 | 1 | 4 | 0 | Worked < 5 |
1234 | 06/07/2016 | 1 | 5 | 1 | Worked >= 5 |
1234 | 09/07/2016 | 1 | 1 | 0 | Worked < 5 |
1234 | 10/07/2016 | 1 | 2 | 0 | Worked < 5 |
1234 | 11/07/2016 | 1 | 3 | 0 | Worked < 5 |
1234 | 12/07/2016 | 1 | 4 | 0 | Worked < 5 |
I'm really not sure how to go about getting the cumulative count for consecutive days, so any help you can give will be amazing
This is a island and gap problem, You can try to use LAG
window function to get the previous startDate
row for each Emp
, ten use SUM
window function to calculate which days are continuous.
Finally, We can use CASE WHEN
expression to judge whether the day is greater than 5.
;WITH CTE AS (
SELECT [Emp],
[startDate],
[First_Entry],
SUM(CASE WHEN DATEDIFF(dd,f_Dt,startDate) <= 1 THEN 0 ELSE 1 END) OVER(PARTITION BY Emp ORDER BY startDate) grp
FROM (
SELECT *,
LAG(startDate,1,startDate) OVER(PARTITION BY Emp ORDER BY startDate) f_Dt
FROM T
) t1
)
SELECT [Emp],
[startDate],
[First_Entry],
SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) Consecutive_Days,
(CASE WHEN SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) >= 5 THEN 1 ELSE 0 END) Over_5,
(CASE WHEN SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) >= 5 THEN 'Worked >= 5' ELSE 'Worked < 5' END) Status
FROM CTE