Search code examples
sql-serversumwindow-functionsdense-rank

DENSE_RANK on two columns, where one column are distinct values and the other has duplicate


I have the following scenario where I'm struggling to understand how to apply DENSE_RANK() to get the result I want:

ID Date Value
1 1990-05-17 1.00
1 1991-10-12 1.00
1 1992-08-01 1.00
1 1993-07-05 0.67
1 1994-05-02 0.67
1 1995-02-01 1.00
1 1996-03-01 1.00

Based on the above data, I'm trying to identify distinct periods using the combination of the Date and Value columns, where a unique period is identified from where the Value column changes from one value to another. Here's the result I'm looking for:

ID Date Value Period
1 1990-05-17 1.00 1
1 1991-10-12 1.00 1
1 1992-08-01 1.00 1
1 1993-07-05 0.67 2
1 1994-05-02 0.67 2
1 1995-02-01 1.00 3
1 1996-03-01 1.00 3

As you can see, there are 3 distinct periods. The problem I am having is that when I use DENSE_RANK(), I get one of two outcomes:

SELECT DENSE_RANK() OVER (PARTITION BY ID ORDER BY Date, Value)

ID Date Value Period
1 1990-05-17 1.00 1
1 1991-10-12 1.00 2
1 1992-08-01 1.00 3
1 1993-07-05 0.67 4
1 1994-05-02 0.67 5
1 1995-02-01 1.00 6
1 1996-03-01 1.00 7

SELECT DENSE_RANK() OVER (PARTITION BY ID ORDER BY Value)

ID Date Value Period
1 1990-05-17 1.00 1
1 1991-10-12 1.00 1
1 1992-08-01 1.00 1
1 1993-07-05 0.67 2
1 1994-05-02 0.67 2
1 1995-02-01 1.00 1
1 1996-03-01 1.00 1

As you can see, the problem lies with the Date column as I need that to be a cumulative period. Furthermore, the amount of periods will vary from ID to ID and there's no consistent science behind the Date column. A member could have two entries in one year for example.


Solution

  • You can use LAG() window function to get for each row its previous Value and with conditional aggregation with SUM() window function get the Periods:

    SELECT ID, Date, Value,
           SUM(CASE WHEN VALUE = prev_value THEN 0 ELSE 1 END) OVER (PARTITION BY ID ORDER BY Date) Period 
    FROM (
      SELECT *, LAG(Value) OVER (PARTITION BY ID ORDER BY Date) prev_value
      FROM tablename
    ) t
    ORDER BY Date;
    

    See the demo.