I have the following scenario where I'm struggling to understand how to apply DENSE_RANK()
to get the result I want:
ID | Date | Value |
---|---|---|
1 | 1990-05-17 | 1.00 |
1 | 1991-10-12 | 1.00 |
1 | 1992-08-01 | 1.00 |
1 | 1993-07-05 | 0.67 |
1 | 1994-05-02 | 0.67 |
1 | 1995-02-01 | 1.00 |
1 | 1996-03-01 | 1.00 |
Based on the above data, I'm trying to identify distinct periods using the combination of the Date
and Value
columns, where a unique period is identified from where the Value
column changes from one value to another. Here's the result I'm looking for:
ID | Date | Value | Period |
---|---|---|---|
1 | 1990-05-17 | 1.00 | 1 |
1 | 1991-10-12 | 1.00 | 1 |
1 | 1992-08-01 | 1.00 | 1 |
1 | 1993-07-05 | 0.67 | 2 |
1 | 1994-05-02 | 0.67 | 2 |
1 | 1995-02-01 | 1.00 | 3 |
1 | 1996-03-01 | 1.00 | 3 |
As you can see, there are 3 distinct periods. The problem I am having is that when I use DENSE_RANK()
, I get one of two outcomes:
SELECT DENSE_RANK() OVER (PARTITION BY ID ORDER BY Date, Value)
ID | Date | Value | Period |
---|---|---|---|
1 | 1990-05-17 | 1.00 | 1 |
1 | 1991-10-12 | 1.00 | 2 |
1 | 1992-08-01 | 1.00 | 3 |
1 | 1993-07-05 | 0.67 | 4 |
1 | 1994-05-02 | 0.67 | 5 |
1 | 1995-02-01 | 1.00 | 6 |
1 | 1996-03-01 | 1.00 | 7 |
SELECT DENSE_RANK() OVER (PARTITION BY ID ORDER BY Value)
ID | Date | Value | Period |
---|---|---|---|
1 | 1990-05-17 | 1.00 | 1 |
1 | 1991-10-12 | 1.00 | 1 |
1 | 1992-08-01 | 1.00 | 1 |
1 | 1993-07-05 | 0.67 | 2 |
1 | 1994-05-02 | 0.67 | 2 |
1 | 1995-02-01 | 1.00 | 1 |
1 | 1996-03-01 | 1.00 | 1 |
As you can see, the problem lies with the Date
column as I need that to be a cumulative period. Furthermore, the amount of periods will vary from ID
to ID
and there's no consistent science behind the Date
column. A member could have two entries in one year for example.
You can use LAG()
window function to get for each row its previous Value and with conditional aggregation with SUM()
window function get the Periods
:
SELECT ID, Date, Value,
SUM(CASE WHEN VALUE = prev_value THEN 0 ELSE 1 END) OVER (PARTITION BY ID ORDER BY Date) Period
FROM (
SELECT *, LAG(Value) OVER (PARTITION BY ID ORDER BY Date) prev_value
FROM tablename
) t
ORDER BY Date;
See the demo.