Search code examples
sqlsql-servercounter

Microsoft SQL Server Streak Counter


I have data in the following format:

ID Period Date
1 1 2020-09-05
1 2 2020-10-01
1 3 2020-10-30
2 2 2020-10-01
2 4 2020-12-05
2 6 2021-02-05
3 1 2020-09-05

I want to calculate a streak grouping by both ID and period. The first date instance for each group should start at one, and if the date in the next row is less than 30 days from the first row, +1 to the streak counter. This should reset to 1 if the next row is 30 or more days.

This is my desired output:

ID Period Date Counter
1 1 2020-09-05 1
1 2 2020-10-01 2
1 3 2020-10-30 3
2 2 2020-10-01 1
2 4 2020-12-05 1
2 6 2021-02-05 1
3 1 2020-09-05 1

I'm completely stuck on this - any help would be greatly appreciated. I'm using Microsoft SQL server.

EDIT: Actually using Microsoft SQL Server


Solution

  • From Oracle 12c, you can use MATCH_RECOGNIZE to perform a row-by-row comparison:

    SELECT id,
           period,
           "DATE",
           counter
    FROM   table_name
    MATCH_RECOGNIZE(
      PARTITION BY id
      ORDER     BY "DATE"
      MEASURES
        COUNT(*) AS counter
      ALL ROWS PER MATCH
      PATTERN (any_date streak_date*)
      DEFINE
        streak_date AS "DATE" <= PREV("DATE") + INTERVAL '30' DAY
                  -- AND PREV(period) < period
    )
    

    Which, for the sample data:

    CREATE TABLE table_name (ID, Period, "DATE") AS
    SELECT 1, 1, DATE '2020-09-05' FROM DUAL UNION ALL
    SELECT 1, 2, DATE '2020-10-01' FROM DUAL UNION ALL
    SELECT 1, 3, DATE '2020-10-30' FROM DUAL UNION ALL
    SELECT 2, 2, DATE '2020-10-01' FROM DUAL UNION ALL
    SELECT 2, 4, DATE '2020-12-05' FROM DUAL UNION ALL
    SELECT 2, 6, DATE '2021-02-05' FROM DUAL UNION ALL
    SELECT 3, 1, DATE '2020-09-05' FROM DUAL
    

    Note: DATE is a reserved word and it is bad practice to use it as a column name.

    Outputs:

    ID PERIOD DATE COUNTER
    1 1 2020-09-05 00:00:00 1
    1 2 2020-10-01 00:00:00 2
    1 3 2020-10-30 00:00:00 3
    2 2 2020-10-01 00:00:00 1
    2 4 2020-12-05 00:00:00 1
    2 6 2021-02-05 00:00:00 1
    3 1 2020-09-05 00:00:00 1

    db<>fiddle here