Search code examples
sqlsql-servert-sqlrow-numbergaps-and-islands

SQL first order, then partition in over clause


I have a problem, that I want to partition over a sorted table. Is there a way I can do that?

I am using SQL Server 2016.

Input Table:

|---------|-----------------|-----------|------------|
|  prod   |   sortcolumn    |    type   |    value   |
|---------|-----------------|-----------|------------|
|    X    |        1        |     P     |     12     |
|    X    |        2        |     P     |     23     |
|    X    |        3        |     E     |     34     |
|    X    |        4        |     P     |     45     |
|    X    |        5        |     E     |     56     |
|    X    |        6        |     E     |     67     |
|    Y    |        1        |     P     |     78     |
|---------|-----------------|-----------|------------|

Desired Output

|---------|-----------------|-----------|------------|------------|
|  prod   |   sortcolumn    |    type   |    value   |    rowNr   |
|---------|-----------------|-----------|------------|------------|
|    X    |        1        |     P     |     12     |      1     |
|    X    |        2        |     P     |     23     |      2     |
|    X    |        3        |     E     |     34     |      1     |
|    X    |        4        |     P     |     45     |      1     |
|    X    |        5        |     E     |     56     |      1     |
|    X    |        6        |     E     |     67     |      2     |
|    Y    |        1        |     P     |     78     |      1     |
|---------|-----------------|-----------|------------|------------|

I am this far:

SELECT
  table.*,
  ROW_NUMBER() OVER(PARTITION BY table.prod, table.type ORDER BY table.sortColumn) rowNr
FROM table

But this does not restart the row number on the 4th row, since it is the same prod and type. How could I restart on every prod and also on every type change based on the sort criteria, even if the type changes back to something it already was previously? Is this even possible with a ROW_NUMBER function or do I have to work with LEAD and LAG and CASES (which would probably make it very slow, right?)

Thanks!


Solution

  • This is a gaps and islands problem. You can use the following query:

    SELECT t.*, 
           ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn)
           -
           ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) AS grp
    FROM mytable t
    

    to get:

    prod    sortcolumn  type    value   grp
    ----------------------------------------
    X       1           P       12      0
    X       2           P       23      0
    X       3           E       34      2
    X       4           P       45      1
    X       5           E       56      3
    X       6           E       67      3
    Y       1           P       78      0
    

    Now, field grp can be used for partitioning:

    ;WITH IslandsCTE AS (
        SELECT t.*, 
               ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn)
               -
               ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) AS grp
        FROM mytable t  
    )
    SELECT prod, sortcolumn, type, value,
           ROW_NUMBER() OVER (PARTITION BY prod, type, grp ORDER BY sortcolumn) AS rowNr
    FROM IslandsCTE
    ORDER BY prod, sortcolumn
    

    Demo here