Search code examples
sqlsql-servercommon-table-expressiongaps-and-islandsdate-arithmetic

How to write "streak" query?


I have a table named MyDates with columns Id (INT) and MyDate (DATE). MyDates can contain several rows with the same date.

I want to count the number of consecutive workdays, starting from today and going backwards in time, where a workday is defined as any weekday from Monday to Friday.

I have tried to write a query using cte (common table expression), but I am stuck

Lets say that today is 27. April 2023 The MyDates table contains the following seven rows:

1, 2023-04-17
2, 2023-04-21
3, 2023-04-24
4, 2023-04-24
5, 2023-04-25
6, 2023-04-26
7, 2023-04-27

I want the query to return Streak = 5

That is because the streak should count the four dates from Thursday 2023-14-27 to Monday 2023-04-24, plus Friday 2023-04-21.

The query should ignore Saturday 2023-04-22 and Sunday 2023-04-23, and it should ignore the fact that there are to rows for Monday 2023-04-24

SELECT MIN(date) AS start_date, MAX(date) AS end_date, COUNT(*) AS streak_length
FROM (
  SELECT date, value, ROW_NUMBER() OVER (ORDER BY date) AS rn, 
         ROW_NUMBER() OVER (PARTITION BY value >= threshold, is_weekend ORDER BY date) AS grp
  FROM (
    SELECT date, value,
           CASE WHEN DAYOFWEEK(date) IN (1, 7) THEN 1 ELSE 0 END AS is_weekend
    FROM table_name
  ) sub
) sub
WHERE value >= threshold
GROUP BY value, grp - rn
ORDER BY start_date;

Solution

  • Here is a step-by-step approach at this gaps-and-islands problem.

    We would start by deduplicating the dates, and associating the "previous" date with lag(); just in case, we can also exclude dates that fall on week-ends, if any:

    select date, lag(date) over(order by date) lag_date
    from mytable
    where datepart(weekday, date) not in (1, 7)
    group by date
    

    From there on, we can use a conditional sum to define groups of "adjacent" records (aka islands) ; the trick is to adjust the date arithmetic when we meet a week-end:

    select t.*, 
        sum(
            case when date = dateadd(
                day, 
                case when datepart(weekday, lag_date) = 6 then 3 else 1 end,
                lag_date
            ) then 0 else 1 end) over(order by date desc) grp
    from (
        select date, lag(date) over(order by date) lag_date
        from mytable
        group by date
    ) t
        
    

    The final step is to retain the first group only (that corresponds to the most recent streak):

    select min(lag_date) streak_start, max(date) streak_end,
        count(*) + 1 streak_length
    from (
        select t.*, 
            sum(
                case when date = dateadd(
                    day, 
                    case when datepart(weekday, lag_date) = 6 then 3 else 1 end,
                    lag_date
                ) then 0 else 1 end) over(order by date desc) grp
        from (
            select date, lag(date) over(order by date) lag_date
            from mytable
            where datepart(weekday, date) not in (1, 7)
            group by date
        ) t
    ) t
    where grp = 0
    
    streak_start streak_end streak_length
    2023-04-21 2023-04-27 5

    fiddle


    Important note: dayofweek, that is used in your original query, is not a thing in SQL Server. We use datepart(weekday, ...) instead. However please note that the result of this function depend on how you set the datefirst parameter. The answer here assumes set datefirst 7 (meaning that dates weeks start on Sunday, as implied by your original code).