Search code examples
sqlsnowflake-cloud-data-platformwindow-functions

Counting total IDs in a table for changing dates using Snowflake


So let's say I have some data as follows. It is a table where the distinct identifier for each row is the ID + the MONTH_END.

TABLE1

ID  MONTH_END        data
001 2022-03-31      grape
001 2022-04-30     orange
001 2022-05-31  pineapple
002 2021-01-31      berry
002 2021-02-28     banana
003 2019-02-28     orange   
003 2021-12-31      grape     
003 2022-01-31     banana      
003 2019-02-28     orange    

My goal here is I want to make a new variable called frequency where I take a new table and determine how many distinct times the ID appears before and during the MONTH_END in table1.

TABLE2

ID   freq_date  freq_val
001 2022-03-31         4
002 2022-02-20       242
003 2021-12-31       122
004 2020-07-02       342
001 2021-08-23        44
001 2021-01-13      7347
003 2021-01-21       212

Additionally, I would like the sum of the freq_vals in the same fashion called freq_val2.

Final expected output:

ID  MONTH_END        data  frequency freq_val2
001 2022-03-31      grape          3      7395  
001 2022-04-30     orange          3      7395
001 2022-05-31  pineapple          3      7395
002 2021-01-31      berry          0         0
002 2021-02-28     banana          1       242
003 2019-02-28     orange          0         0
003 2021-12-31      grape          1       122
003 2022-01-31     banana          2       334
003 2019-02-28     orange          2       334

Solution

  • I think the logic is correct in the query, but I'm unable to obtain the same results as shown above - please read my comment regarding some questions about your test data (duplicate rows, and values that exceed the logic threshold described).

    -- 
    -- Create tables and populate with test data 
    --
    create
    or replace table table_1 (id varchar, month_end date, data varchar);
    
    insert into
        table_1
    values
        ('001','2022-03-31'::date,'grape'), 
        ('001','2022-04-30'::date,'orange'), 
        ('001','2022-05-31'::date,'pineapple'), 
        ('002','2021-01-31'::date,'berry'), 
        ('002','2021-02-28'::date,'banana'), 
        ('003','2019-02-28'::date,'orange'), 
        ('003','2021-12-31'::date,'grape'), 
        ('003','2022-01-31'::date,'banana'), 
        ('003','2019-02-28'::date,'orange'); 
        
    create
        or replace table table_2 (id varchar, freq_date date, freq_val integer);
    insert into
        table_2
    values
        ('001','2022-03-31'::date, 4),
        ('002','2022-02-20'::date, 242),
        ('003','2021-12-31'::date, 122),
        ('004','2020-07-02'::date, 342),
        ('001','2021-08-23'::date, 44),
        ('001','2021-01-13'::date, 7347),
        ('003','2021-01-21'::date, 212);
    
    --
    -- Results query
    --
    select
        t1.id,
        t1.month_end,
        t1.data,
        count(t2.id) as frequency,
        sum(nvl(t2.freq_val, 0)) as freq_val2
    from
        table_1 t1
        left join table_2 t2 on t1.id = t2.id
        and t2.freq_date <= t1.month_end
    group by
        1,
        2,
        3
    order by
        1,
        2; 
    

    Results:

    ID  MONTH_END   DATA        FREQUENCY   FREQ_VAL2
    001 2022-03-31  grape       3           7395
    001 2022-04-30  orange      3           7395
    001 2022-05-31  pineapple   3           7395
    002 2021-01-31  berry       0           0
    002 2021-02-28  banana      0           0
    003 2019-02-28  orange      0           0
    003 2021-12-31  grape       2           334
    003 2022-01-31  banana      2           334