Search code examples
sqlsql-servergaps-and-islands

Tag islands of records


I have this data set:

Id PrevId NextId Product Process Date
1 NULL 4 Product 1 Process A 2021-04-24
2 NULL 3 Product 2 Process A 2021-04-24
3 2 5 Product 2 Process A 2021-04-24
4 1 7 Product 1 Process B 2021-04-26
5 3 6 Product 2 Process B 2021-04-24
6 5 NULL Product 2 Process B 2021-04-24
7 4 9 Product 1 Process B 2021-04-29
9 7 10 Product 1 Process A 2021-05-01
10 9 15 Product 1 Process A 2021-05-03
15 10 19 Product 1 Process A 2021-05-04
19 15 NULL Product 1 Process C 2021-05-05

Per product, I need to tag consecutive/islands of records that have the same Process like:

Id PrevId NextId Product Process Date Tag
1 NULL 4 Product 1 Process A 2021-04-24 1
4 1 7 Product 1 Process B 2021-04-26 2
7 4 9 Product 1 Process B 2021-04-29 2
9 7 10 Product 1 Process A 2021-05-01 3
10 9 15 Product 1 Process A 2021-05-03 3
15 10 19 Product 1 Process A 2021-05-04 3
19 15 NULL Product 1 Process C 2021-05-05 4

A product goes through multiple Process-es and can go through the same one more than one time.

I basically need to produce the Tag column, the logic behind it is consecutive records with the same Process should be grouped together but a caveat is that the same process can appear further down the line but should be treated as a new group.

I have tried the basic windowing functions (ROW_NUMBER and DENSE_RANK) but the problem is that those count within the partition and not across partitions.


Solution

  • If you don't have to validate prevId and nextId (that is if your data is already correctly ordered) you could try the following:

    WITH cte AS(
    SELECT *
           , ROW_NUMBER() OVER (PARTITION BY Product ORDER BY [Date]) x
           , DENSE_RANK() OVER (PARTITION BY Product, Process ORDER BY [Date]) y
      FROM T1
      WHERE product = 'Product 1'
    ),
    cteTag AS(
    SELECT Id, PrevId, NextId, Product, Process, [Date], x-y AS Tag_
      FROM cte
    )
    SELECT Id, PrevId, NextId, Product, Process, [Date], DENSE_RANK() OVER (PARTITION BY Product ORDER BY Tag_) AS Tag
      FROM cteTag
    ORDER BY [Date]