Search code examples
sqlsnowflake-cloud-data-platformwindow-functionslaggaps-and-islands

Lag() to get what a value changes to snowflake


I have a table that has data that looks like:

ID NAME DATE ORDER
15 MICHELLE 2023-01-01 2
15 MICHELLE 2023-01-04 3
15 MITCH 2023-01-15 7

(this is a small sample of my table data for a specific id)

I would like to get a table the gives back the ID, and what the value was at a specific point in time/ what it changed to

For example, I would like my table to look like

ID NAME FROM_DATE TO_DATE
15 MICHELLE 2023-01-01 2023-01-15
15 MITCH 2023-01-15 2023-03-10

Where for the last value given an id, its TO_DATE would be the current date.

I've done

SELECT ID, 
       LAG(NAME) OVER (PARTITION BY ID ORDER BY ORDER) AS NAME,
       LAG(DATE) OVER (PARTITION BY ID, ORDER BY ORDER) AS FROM_DATE, 
       DATE AS TO_DATE
FROM MY_TABLE

However this is returning

ID NAME FROM_DATE TO_DATE
15 MICHELLE 2023-01-01 2023-01-04
15 MICHELLE 2023-01-04 2023-01-15

Rather than just putting MICHELLE into one row since there is no change from 2023-01-01 to 2023-01-15 and I am unable to show that the name has changed from MICHELLE to MITCH and that change has continued until today's date (since there is not record that comes after it)

Is there a way I can do this? Thanks!


Solution

  • You can use the lead window function on the min(DATE) aggregate like this:

    select ID, NAME, min(DATE) as FROM_DATE
          ,lead(FROM_DATE, 1, current_date) 
             over (partition by ID order by FROM_DATE) as TO_DATE
    from MY_TABLE
    group by ID, NAME;
    

    Here's a self-contained sample:

    with MY_TABLE as
        (
        select 
        COLUMN1::int as "ID",
        COLUMN2::string as "NAME",
        COLUMN3::date as "DATE",
        COLUMN4::int as "ORDER"
        from (values
        ('15','MICHELLE','2023-01-01','2'),
        ('15','MICHELLE','2023-01-04','3'),
        ('15','MITCH','2023-01-15','7')
        )
    )
    select ID, NAME, min(DATE) as FROM_DATE, lead(FROM_DATE, 1, current_date) over (partition by ID order by FROM_DATE) as TO_DATE
    from MY_TABLE
    group by ID, NAME;
    

    Edit: If the names can change back to the names they were before with the same ID, here's an example with more test data and updated solution for that. It uses the conditional_change_event function to form new groups. It sorts the dates descending to make it easy to know which rows are at the end and need to be change to the current_date:

    with MY_TABLE as
        (
        select 
        COLUMN1::int as "ID",
        COLUMN2::string as "NAME",
        COLUMN3::date as "DATE",
        COLUMN4::int as "ORDER"
        from (values
        ('15','MICHELLE','2023-01-01','2'),
        ('15','MICHELLE','2023-01-04','3'),
        ('15','MICHELLE','2023-01-05','4'),
        ('15','MITCH',   '2023-01-15','7'),
        ('15','MICHELLE','2023-02-04','9'),
        ('16','BOB','2023-01-01','8')
        )
    )
    select   ID
            ,NAME
            ,min("DATE") as START_DATE
            ,iff(NAME_CHANGE = 0, current_date, max("DATE")) as END_DATE
    from (
        select    *
            ,conditional_change_event("NAME") over (partition by ID order by "DATE" desc) NAME_CHANGE 
        from     MY_TABLE
    )
    group by ID, NAME, NAME_CHANGE
    order by ID, START_DATE
    ;
    
    ID NAME START_DATE END_DATE
    15 MICHELLE 2023-01-01 00:00:00 2023-01-05
    15 MITCH 2023-01-15 00:00:00 2023-01-15
    15 MICHELLE 2023-02-04 00:00:00 2023-03-11
    16 BOB 2023-01-01 00:00:00 2023-03-11