Search code examples
sqldatabasewindow-functionsimpalagaps-and-islands

A cumulative sum of consecutive workdays that resets to 1 when consecutive days = 0, per ID


I have 3 columns:

Employee ID(numerical)

Day of work(a date yyyy-mm-dd when employee had a shift)

is_consecutive_work_day (1 if days of work are consecutive, else 0)

I need a 4th: Consecutive_work_days (a cumulative sum of is_consecutive_work_day, which resets to 1 when is_consecutive_work_day = 0). So this will go to a maximum of 5 for any employee id. Some will have 1,2,3 others 1,2...etc.

What am failing to figure out is how to write the 4th column (consecutive_work_days). Not how to write a consecutive sum per employee id, but specifically how to reset to 1 when is_consecutive_work_day = 0 per employee id.

May I ask for your help regarding this 4th column please? Thanks.


Solution

  • You can use window functions. lag() lets you access the previous day_of_work for the same employee, which you can compare to the current day_of_work: if there is a one day difference, then you can set is_consecutive_work_day to 1.

    select
        employee_id,
        day_of_work,
        case 
            when day_of_work 
                = lag(day_of_work) over(partition by employee_id order by day_of_work) 
                    + interval 1 day
            then 1
            else 0
        end is_consecutive_work_day 
    from mytable
    

    To compute the cumulative sum, it is a bit more complicated. We can use some gaps-and-island technique to put each record in the group it belongs to: basically, everytime is_consecutive_work_day of 0 is met, a new group starts; we can then do a window sum() over each group:

    select 
        employee_id,
        day_of_work,
        is_consecutive_work_day,
        sum(is_consecutive_work_day) 
            over(partition by employee_id, grp order by day_of_work)
            consecutive_work_days 
    from (
        select 
            t.*,
            sum(1 - is_consecutive_work_day) over(partition by employee_id order by day_of_work)  grp
        from (
            select
                t.*,
                case 
                    when day_of_work 
                        = lag(day_of_work) over(partition by employee_id order by day_of_work) 
                            + interval 1 day
                    then 1
                    else 0
                end is_consecutive_work_day 
            from mytable t
        ) t
    ) t