Search code examples
maxsnowflake-cloud-data-platformself-join

Join a table onto itself


I have a table like this where TIMELINE is just an overall series of dates in order and DATE that I've matched to the dates in TIMELINE and contains actual records of dates that have values (shown in the NUMBER column)

TIMELINE DATE NUMBER
2022-03-03 2022-03-03 NULL
2022-03-04 2022-03-04 40
2022-03-07 NULL NULL
2022-03-08 NULL NULL
2022-06-08 2022-06-08 45
2022-06-28 2022-06-28 NULL
2022-06-29 NULL NULL
2022-06-30 NULL NULL
2022-07-08 2022-07-08 80

I am trying to fill in this table so that for every DATE that is NULL, its NUMBER becomes the value of the most recent/max DATE that is <= to the current date of the row I am looking at. So for example, in the third row, the DATE is NULL so its NUMBER would become the value (40) that is associated with the date 2022-03-04 and the same thing for fourth row. Then for the third to last and the second to last row, its NUMBER would be NULL because that is the value associated with the most recent date before that row.

I am trying to get an output like this:

TIMELINE DATE NUMBER
2022-03-03 2022-03-03 NULL
2022-03-04 2022-03-04 40
2022-03-07 NULL 40
2022-03-08 NULL 40
2022-06-08 2022-06-08 45
2022-06-28 2022-06-28 NULL
2022-06-29 NULL NULL
2022-06-30 NULL NULL
2022-07-08 2022-07-08 80

Would this be achieved by joining the table onto itself?/Is there a function that would help me do this? Thanks!


Solution

  • Here's an example of the update query that would produce the desired results. Although I've used your reserved words, I would suggest avoiding those in your modeling.

    --
    -- Create test table
    --
    create
    or replace table x (timeline date, date date, number integer);
    
    --
    -- load test data
    --
    insert into
        x
    values
        ('2022-03-03'::date,'2022-03-03'::date,NULL), ('2022-03-04'::date,'2022-03-04'::date,40), ('2022-03-07'::date,NULL,NULL), ('2022-03-08'::date,NULL,NULL), ('2022-06-08'::date,'2022-06-08'::date,45),('2022-06-28'::date,'2022-06-28'::date,NULL), ('2022-06-29'::date,NULL,NULL), ('2022-06-30'::date,NULL,NULL), ('2022-07-08'::date,'2022-07-08'::date,80);
        
        --
        -- Update the original table via a correlation
        --
    
    update
        x
    set
        x.number = z.number
    from
        (
            select
                a.timeline,
                a.date,
                case
                    when (a.date is NULL) then b.number
                    else a.number
                end number
            from
                x a
                left join x b on b.timeline = (
                    select
                        max(timeline)
                    from
                        x
                    where
                        timeline < a.timeline
                        and date is not null
                )
        ) z
    where
        x.timeline = z.timeline
        and x.date = z.date;
        
        --
        -- Results
        --
        select * from x order by 1;
    

    Results:

    TIMELINE    DATE        NUMBER
    2022-03-03  2022-03-03  
    2022-03-04  2022-03-04  40
    2022-03-07              40
    2022-03-08              40
    2022-06-08  2022-06-08  45
    2022-06-28  2022-06-28  
    2022-06-29      
    2022-06-30      
    2022-07-08  2022-07-08  80