Take a table like such:
ID | KEYID | RECVRID | STAGE | STAGEDATE |
---|---|---|---|---|
834 | 3 | 61 | PUBLISHEDDATE | Sunday, November 19, 2023 |
834 | 3 | 61 | VIEWEDDATE | Monday, November 20, 2023 |
92 | 3 | 61 | PUBLISHEDDATE | Monday, November 20, 2023 |
526 | 3 | 61 | PUBLISHEDDATE | Wednesday, November 22, 2023 |
526 | 3 | 61 | VIEWEDDATE | Sunday, November 26, 2023 |
835 | 3 | 61 | PUBLISHEDDATE | Tuesday, November 28, 2023 |
835 | 3 | 61 | VIEWEDDATE | Tuesday, November 30, 2023 |
This is an item (KEYID) going through a series of stages where the date that each stage is hit gets recorded (STAGEDATE). The item always starts with a PUBLISHEDDATE and gets assigned an ID. The item can then move on to one of two paths. In the first path it gets viewed and a VIEWEDDATE is recorded. After the VIEWEDDATE is recorded it gets recycled and is assigned a new ID and new PUBLISHEDDATE. In the second path it never gets viewed and gets recycled which also assigns it a new ID and PUBLISHEDDATE.
What I'm trying to do is build a new column that orders the table by date and assigns a sequential number every time the item is recycled. So, what Id' like to have is:
ID | KEYID | RECVRID | STAGE | STAGEDATE | HISTORY |
---|---|---|---|---|---|
834 | 3 | 61 | PUBLISHEDDATE | Sunday, November 19, 2023 | 1 |
834 | 3 | 61 | VIEWEDDATE | Monday, November 20, 2023 | 1 |
92 | 3 | 61 | PUBLISHEDDATE | Monday, November 20, 2023 | 2 |
526 | 3 | 61 | PUBLISHEDDATE | Wednesday, November 22, 20232 | 3 |
526 | 3 | 61 | VIEWEDDATE | Sunday, November 26, 2023 | 3 |
835 | 3 | 61 | PUBLISHEDDATE | Tuesday, November 28, 2023 | 4 |
835 | 3 | 61 | VIEWEDDATE | Tuesday, November 30, 2023 | 4 |
I've tried doing a DENSE_RANK but I can't rank on ID because there's no guarantee the next ID will be higher than the last.
DENSE_RANK() OVER(PARTITION BY KEYID, RECVRID ORDER BY STAGEDATE) AS HISTORY
As I interpret your corrected data and desired results, what you really want is a new HISTORY
sequence number for each distinct ID
value, ordered by the earliest date for that ID
(STAGE
can actually be ignored). Different KEYID
values (not shown in the sample data) would get their own HISTORY
sequence.
I believe the following will do the trick.
SELECT D.*, H.HISTORY
FROM Data D
JOIN (
SELECT
KEYID, ID,
ROW_NUMBER() OVER(PARTITION BY KEYID ORDER BY MIN(STAGEDATE)) AS HISTORY
FROM Data D
GROUP BY KEYID, ID
) H ON H.KEYID = D.KEYID AND H.ID = D.ID
ORDER BY STAGEDATE, HISTORY
If you really want to focus on just the PUBLISHEDDATE stages, you can add WHERE STAGE = 'PUBLISHEDDATE'
or change the MIN(STAGEDATE)
to the conditional aggregation expression MIN(CASE WHEN STAGE = 'PUBLISHEDDATE' THEN STAGEDATE END)
. All variants produce the same results on the current data. Results may be different if an ID
value exists without a 'PUBLISHEDDATE'
row.
Results:
ID | KEYID | RECVRID | STAGE | STAGEDATE | HISTORY |
---|---|---|---|---|---|
834 | 3 | 61 | PUBLISHEDDATE | 2023-11-19 | 1 |
834 | 3 | 61 | VIEWEDDATE | 2023-11-20 | 1 |
92 | 3 | 61 | PUBLISHEDDATE | 2023-11-20 | 2 |
526 | 3 | 61 | PUBLISHEDDATE | 2023-11-22 | 3 |
526 | 3 | 61 | VIEWEDDATE | 2023-11-26 | 3 |
835 | 3 | 61 | PUBLISHEDDATE | 2023-11-28 | 4 |
835 | 3 | 61 | VIEWEDDATE | 2023-11-30 | 4 |
See this db<>fiddle.