Search code examples
sqlsql-servert-sqlsql-server-2014ranking

Ranking: How to reset the ROW_NUMBER or RANK to 1


Using SQL Server 2014:

Consider the following table:

DECLARE @Table TABLE (
      Id int NOT NULL identity(1,1),
      Col_Value varchar(2)
)

INSERT INTO @Table (Col_Value)
VALUES ('A'),('A'),('B'),('B'),('B'),('A'),('A'),('B'),('B'),('B'),('A'),('B'),('B'),('A'),('A'),('B'),('C'),('C'),('A'),('A'),('B'),('B'),('C')

How can I create a query that produces R column in the result like below

+----+------+---+
| ID | Data | R |
+----+------+---+
| 1  | A    | 1 |
+----+------+---+
| 2  | A    | 2 |
+----+------+---+
| 3  | B    | 1 |
+----+------+---+
| 4  | B    | 2 |
+----+------+---+
| 5  | B    | 3 |
+----+------+---+
| 6  | A    | 1 |
+----+------+---+
| 7  | A    | 2 |
+----+------+---+
| 8  | B    | 1 |
+----+------+---+
| 9  | B    | 2 |
+----+------+---+
| 10 | B    | 3 |
+----+------+---+
| 11 | A    | 1 |
+----+------+---+
| 12 | B    | 1 |
+----+------+---+
| 13 | B    | 2 |
+----+------+---+
| 14 | A    | 1 |
+----+------+---+
| 15 | A    | 2 |
+----+------+---+
| 16 | B    | 1 |
+----+------+---+
| 17 | C    | 1 |
+----+------+---+
| 18 | C    | 2 |
+----+------+---+
| 19 | A    | 1 |
+----+------+---+
| 20 | A    | 2 |
+----+------+---+
| 21 | B    | 1 |
+----+------+---+
| 22 | B    | 2 |
+----+------+---+
| 23 | C    | 1 |
+----+------+---+

In the above result table, once Data column changes in a row, the R value resets to 1

Update 1

Ben Thul's answer works very well.

I suggest below post be updated with a reference to this answer.

T-sql Reset Row number on Field Change


Solution

  • This is known as a "gaps and islands" problem in the literature. First, my proposed solution:

    with cte as (
        select *, [Id] - row_number() over (partition by [Col_Value] order by [Id]) as [GroupID]
        from @table
    )
    select [Id], [Col_Value], row_number() over (partition by [GroupID], [Col_Value] order by [Id])
    from cte
    order by [Id];
    

    For exposition, note that if I enumerate all of the "A" values using row_number(), those that are contiguous have the row_number() value go up at the same rate as the Id value. Which is to say that their difference will be the same for those in that contiguous group (also known as an "island"). Once we calculate that group identifier, it's merely a matter of enumerating each member per group.