Search code examples
sqlsql-servert-sqlwindow-functions

Windowing Events


Take a table like such:

ID KEYID RECVRID STAGE STAGEDATE
834 3 61 PUBLISHEDDATE Sunday, November 19, 2023
834 3 61 VIEWEDDATE Monday, November 20, 2023
92 3 61 PUBLISHEDDATE Monday, November 20, 2023
526 3 61 PUBLISHEDDATE Wednesday, November 22, 2023
526 3 61 VIEWEDDATE Sunday, November 26, 2023
835 3 61 PUBLISHEDDATE Tuesday, November 28, 2023
835 3 61 VIEWEDDATE Tuesday, November 30, 2023

This is an item (KEYID) going through a series of stages where the date that each stage is hit gets recorded (STAGEDATE). The item always starts with a PUBLISHEDDATE and gets assigned an ID. The item can then move on to one of two paths. In the first path it gets viewed and a VIEWEDDATE is recorded. After the VIEWEDDATE is recorded it gets recycled and is assigned a new ID and new PUBLISHEDDATE. In the second path it never gets viewed and gets recycled which also assigns it a new ID and PUBLISHEDDATE.

What I'm trying to do is build a new column that orders the table by date and assigns a sequential number every time the item is recycled. So, what Id' like to have is:

ID KEYID RECVRID STAGE STAGEDATE HISTORY
834 3 61 PUBLISHEDDATE Sunday, November 19, 2023 1
834 3 61 VIEWEDDATE Monday, November 20, 2023 1
92 3 61 PUBLISHEDDATE Monday, November 20, 2023 2
526 3 61 PUBLISHEDDATE Wednesday, November 22, 20232 3
526 3 61 VIEWEDDATE Sunday, November 26, 2023 3
835 3 61 PUBLISHEDDATE Tuesday, November 28, 2023 4
835 3 61 VIEWEDDATE Tuesday, November 30, 2023 4

I've tried doing a DENSE_RANK but I can't rank on ID because there's no guarantee the next ID will be higher than the last.

DENSE_RANK() OVER(PARTITION BY KEYID, RECVRID ORDER BY STAGEDATE) AS HISTORY

Solution

  • As I interpret your corrected data and desired results, what you really want is a new HISTORY sequence number for each distinct ID value, ordered by the earliest date for that ID (STAGE can actually be ignored). Different KEYID values (not shown in the sample data) would get their own HISTORY sequence.

    I believe the following will do the trick.

    SELECT D.*, H.HISTORY
    FROM Data D
    JOIN (
        SELECT
            KEYID, ID,
            ROW_NUMBER() OVER(PARTITION BY KEYID ORDER BY MIN(STAGEDATE)) AS HISTORY
        FROM Data D
        GROUP BY KEYID, ID
    ) H ON H.KEYID = D.KEYID AND H.ID = D.ID
    ORDER BY STAGEDATE, HISTORY
    

    If you really want to focus on just the PUBLISHEDDATE stages, you can add WHERE STAGE = 'PUBLISHEDDATE' or change the MIN(STAGEDATE) to the conditional aggregation expression MIN(CASE WHEN STAGE = 'PUBLISHEDDATE' THEN STAGEDATE END). All variants produce the same results on the current data. Results may be different if an ID value exists without a 'PUBLISHEDDATE' row.

    Results:

    ID KEYID RECVRID STAGE STAGEDATE HISTORY
    834 3 61 PUBLISHEDDATE 2023-11-19 1
    834 3 61 VIEWEDDATE 2023-11-20 1
    92 3 61 PUBLISHEDDATE 2023-11-20 2
    526 3 61 PUBLISHEDDATE 2023-11-22 3
    526 3 61 VIEWEDDATE 2023-11-26 3
    835 3 61 PUBLISHEDDATE 2023-11-28 4
    835 3 61 VIEWEDDATE 2023-11-30 4

    See this db<>fiddle.