So let's say I have some data as follows. It is a table where the distinct identifier for each row is the ID
+ the MONTH_END
.
TABLE1
ID MONTH_END data
001 2022-03-31 grape
001 2022-04-30 orange
001 2022-05-31 pineapple
002 2021-01-31 berry
002 2021-02-28 banana
003 2019-02-28 orange
003 2021-12-31 grape
003 2022-01-31 banana
003 2019-02-28 orange
My goal here is I want to make a new variable called frequency
where I take a new table and determine how many distinct times the ID
appears before and during the MONTH_END
in table1.
TABLE2
ID freq_date freq_val
001 2022-03-31 4
002 2022-02-20 242
003 2021-12-31 122
004 2020-07-02 342
001 2021-08-23 44
001 2021-01-13 7347
003 2021-01-21 212
Additionally, I would like the sum of the freq_vals
in the same fashion called freq_val2
.
Final expected output:
ID MONTH_END data frequency freq_val2
001 2022-03-31 grape 3 7395
001 2022-04-30 orange 3 7395
001 2022-05-31 pineapple 3 7395
002 2021-01-31 berry 0 0
002 2021-02-28 banana 1 242
003 2019-02-28 orange 0 0
003 2021-12-31 grape 1 122
003 2022-01-31 banana 2 334
003 2019-02-28 orange 2 334
I think the logic is correct in the query, but I'm unable to obtain the same results as shown above - please read my comment regarding some questions about your test data (duplicate rows, and values that exceed the logic threshold described).
--
-- Create tables and populate with test data
--
create
or replace table table_1 (id varchar, month_end date, data varchar);
insert into
table_1
values
('001','2022-03-31'::date,'grape'),
('001','2022-04-30'::date,'orange'),
('001','2022-05-31'::date,'pineapple'),
('002','2021-01-31'::date,'berry'),
('002','2021-02-28'::date,'banana'),
('003','2019-02-28'::date,'orange'),
('003','2021-12-31'::date,'grape'),
('003','2022-01-31'::date,'banana'),
('003','2019-02-28'::date,'orange');
create
or replace table table_2 (id varchar, freq_date date, freq_val integer);
insert into
table_2
values
('001','2022-03-31'::date, 4),
('002','2022-02-20'::date, 242),
('003','2021-12-31'::date, 122),
('004','2020-07-02'::date, 342),
('001','2021-08-23'::date, 44),
('001','2021-01-13'::date, 7347),
('003','2021-01-21'::date, 212);
--
-- Results query
--
select
t1.id,
t1.month_end,
t1.data,
count(t2.id) as frequency,
sum(nvl(t2.freq_val, 0)) as freq_val2
from
table_1 t1
left join table_2 t2 on t1.id = t2.id
and t2.freq_date <= t1.month_end
group by
1,
2,
3
order by
1,
2;
Results:
ID MONTH_END DATA FREQUENCY FREQ_VAL2
001 2022-03-31 grape 3 7395
001 2022-04-30 orange 3 7395
001 2022-05-31 pineapple 3 7395
002 2021-01-31 berry 0 0
002 2021-02-28 banana 0 0
003 2019-02-28 orange 0 0
003 2021-12-31 grape 2 334
003 2022-01-31 banana 2 334