I have a (possibly silly) question. I have to create a view in Snowflake where I have a date column, tenant, count(number_of_requests) and count(number_f_collections). This part I've done and am happy with.
Basic code below:
select date_trunc(day, CREATED)::date as CREATED_DATE,
TENANT,
count(distinct case when comment = 'Non personalised order created' then SIMNO end) as NUMBER_OF_REQUESTS,
count(distinct case when comment = 'Non personalised order collected by customer' then SIMNO end) as NUMBER_OF_COLLECTIONS
from <DATABASE>.<SCHEMA>.<TABLE>;
The second part is what I'm struggling with. In the same view they want a column that contains the count of requests per tenant in the last 1-7 days, requests sent 1-12 days, and then 1-25 days. The same will be for collections. I have no clue how to even do this. Please see required column names.
Any help would be very highly appreciated.
I essentially only managed to create the base code. I tried using between dates, interval - 7, etc. None of it works.
The results will be something like:
Date Tenant No_Requests No_Collections Request_1_7Days Request_1_12Days Request_1_25Days Collection_1_7Days Collection_1_12Days Collection_1_25Days 12/1/2023 Tenant1 65 46 455 780 1625 322 552 1150 12/2/2023 Tenant2 56 53 392 672 1400 371 636 1325 12/3/2023 Tenant3 124 94 868 1488 3100 658 1128 2350 12/4/2023 Tenant4 176 82 1232 2112 4400 574 984 2050
So if we start with some fake data, and you SQL:
with data(created, tenant, comment, simno) as (
select *
from values
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 2),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-01'::timestamp, 't1', 'C', 3)
)
select
CREATED::date as CREATED_DATE,
TENANT,
count(distinct case when comment = 'R' then SIMNO end) as NUMBER_OF_REQUESTS,
count(distinct case when comment = 'C' then SIMNO end) as NUMBER_OF_COLLECTIONS
from data
group by 1,2;
we a see you could use simpler date cast, to truncate to day, you "this code works" needs a group by added.
But otherwise is a good start. I am going to assume the key point you are not sure is how to do (and didn't mention) is that the sum on "simno" has a distinct in it, thus there can be duplicate values, and again an assumption, in the 1-7, 1-12, and 1-25 periods you also want distinct counts, thus the aggregation has to happen three times, or we need to use BITMAPS.
Given I have never used BITMAPS lets go that way (it the optimal path for huge data, thus the more fun path).
with data(created, tenant, comment, simno) as (
select *
from values
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 2),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-02'::timestamp, 't1', 'R', 3),
('2023-12-02'::timestamp, 't1', 'R', 4),
('2023-12-02'::timestamp, 't1', 'R', 5),
('2023-12-02'::timestamp, 't1', 'C', 7),
('2023-12-02'::timestamp, 't1', 'C', 1)
), enriched_data as (
select *
,created::date as created_date
,comment = 'R' as is_request
,comment = 'C' as is_collection
from data
), simno_seq_map as (
select
simno
,seq8() as seq
from (
select distinct simno
from data
)
), mapped_data as (
select d.*
,seq
from enriched_data as d
join simno_seq_map as m
on d.simno = m.simno
), daily_bitmaps as (
select
created_date,
tenant,
BITMAP_BUCKET_NUMBER(seq) as bit_bucket,
BITMAP_CONSTRUCT_AGG(iff(is_request, BITMAP_BIT_POSITION(seq), null)) as req_bit_bmp,
BITMAP_CONSTRUCT_AGG(iff(is_collection, BITMAP_BIT_POSITION(seq), null)) as coll_bit_bmp
from mapped_data
group by 1,2,3
)
select
created_date,
tenant,
sum(r_cnt) as num_requests,
sum(c_cnt) as num_collections
from (
select
created_date,
tenant,
BITMAP_COUNT(BITMAP_OR_AGG(req_bit_bmp)) as r_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(coll_bit_bmp)) as c_cnt
from daily_bitmaps
group by 1,2, bit_bucket
)
group by 1,2
order by 1,2;
Ok, so that lots of work to get to the same place, but now we have daily bitmaps, we can do the combinations of this rows, and have much lower, rows results.
with data(created, tenant, comment, simno) as (
select *
from values
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 2),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-02'::timestamp, 't1', 'R', 3),
('2023-12-02'::timestamp, 't1', 'R', 4),
('2023-12-02'::timestamp, 't1', 'R', 5),
('2023-12-02'::timestamp, 't1', 'C', 7),
('2023-12-02'::timestamp, 't1', 'C', 1)
), enriched_data as (
select *
,created::date as created_date
,comment = 'R' as is_request
,comment = 'C' as is_collection
from data
), simno_seq_map as (
select
simno
,seq8() as seq
from (
select distinct simno
from data
)
), mapped_data as (
select d.*
,seq
from enriched_data as d
join simno_seq_map as m
on d.simno = m.simno
), daily_bitmaps as (
select
created_date,
tenant,
BITMAP_BUCKET_NUMBER(seq) as bit_bucket,
BITMAP_CONSTRUCT_AGG(iff(is_request, BITMAP_BIT_POSITION(seq), null)) as req_bit_bmp,
BITMAP_CONSTRUCT_AGG(iff(is_collection, BITMAP_BIT_POSITION(seq), null)) as coll_bit_bmp
from mapped_data
group by 1,2,3
), data_today as (
select
created_date,
tenant,
sum(r_cnt) as num_requests,
sum(c_cnt) as num_collections
from (
select
created_date,
tenant,
BITMAP_COUNT(BITMAP_OR_AGG(req_bit_bmp)) as r_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(coll_bit_bmp)) as c_cnt
from daily_bitmaps
group by 1,2, bit_bucket
)
group by 1,2
), data_1_7_win as (
select db.*
,dateadd('day', w.v, db.created_date) as window_date
from daily_bitmaps as db
cross join (values (1),(2),(3),(4),(5),(6),(7)) as w(v)
), data_1_7 as (
select
created_date,
tenant,
sum(r_1_7_cnt) as num_requests_1_7,
sum(c_1_7_cnt) as num_collections_1_7
from (
select
d.created_date,
d.tenant,
d.bit_bucket,
BITMAP_COUNT(BITMAP_OR_AGG(dw.req_bit_bmp)) as r_1_7_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(dw.coll_bit_bmp)) as c_1_7_cnt
from daily_bitmaps as d
join data_1_7_win as dw
on d.tenant = dw.tenant
and d.bit_bucket = dw.bit_bucket
and d.created_date = dw.window_date
group by 1,2, d.bit_bucket
)
group by 1,2
)
select
dt.*,
d7.num_requests_1_7,
d7.num_collections_1_7
from data_today as dt
left join data_1_7 as d7
on dt.created_date = d7.created_date
and dt.tenant = d7.tenant
order by 1,2
;
Alrighty, that was a large lift, firstly we use bucketing of the last 1-7 days, to make more rows, but allows for equi-joins, so that complex but fast. Then we sum the prior days, and bind those with the results, thus to get 1-12, and 1-25, you would most likely want to repeat the data_1_7_win
CTE but with 12 & 25 values, and then do data_1_7
for those window values.
Yes this could all be done much the same with range joins, but if you have large amounts of data, that way will perform much slower than this way.
with data(created, tenant, comment, simno) as (
select *
from values
('2023-12-01'::timestamp, 't1', 'R', 10),
('2023-12-01'::timestamp, 't1', 'C', 20),
('2023-12-01'::timestamp, 't1', 'C', 21),
('2023-11-30'::timestamp, 't1', 'R', 30),
('2023-11-30'::timestamp, 't1', 'R', 31),
('2023-11-30'::timestamp, 't1', 'R', 32),
('2023-11-30'::timestamp, 't1', 'C', 40),
('2023-11-30'::timestamp, 't1', 'C', 41),
('2023-11-30'::timestamp, 't1', 'C', 42),
('2023-11-30'::timestamp, 't1', 'C', 43),
('2023-11-23'::timestamp, 't1', 'R', 50),
('2023-11-23'::timestamp, 't1', 'R', 51),
('2023-11-23'::timestamp, 't1', 'R', 52),
('2023-11-23'::timestamp, 't1', 'R', 53),
('2023-11-23'::timestamp, 't1', 'R', 54),
('2023-11-23'::timestamp, 't1', 'C', 60),
('2023-11-23'::timestamp, 't1', 'C', 61),
('2023-11-23'::timestamp, 't1', 'C', 62),
('2023-11-23'::timestamp, 't1', 'C', 63),
('2023-11-23'::timestamp, 't1', 'C', 64),
('2023-11-23'::timestamp, 't1', 'C', 65),
('2023-11-16'::timestamp, 't1', 'R', 70),
('2023-11-16'::timestamp, 't1', 'R', 71),
('2023-11-16'::timestamp, 't1', 'R', 72),
('2023-11-16'::timestamp, 't1', 'R', 73),
('2023-11-16'::timestamp, 't1', 'R', 74),
('2023-11-16'::timestamp, 't1', 'R', 75),
('2023-11-16'::timestamp, 't1', 'R', 76),
('2023-11-16'::timestamp, 't1', 'C', 80),
('2023-11-16'::timestamp, 't1', 'C', 81),
('2023-11-16'::timestamp, 't1', 'C', 82),
('2023-11-16'::timestamp, 't1', 'C', 83),
('2023-11-16'::timestamp, 't1', 'C', 84),
('2023-11-16'::timestamp, 't1', 'C', 85),
('2023-11-16'::timestamp, 't1', 'C', 86),
('2023-11-16'::timestamp, 't1', 'C', 87)
), enriched_data as (
select *
,created::date as created_date
,comment = 'R' as is_request
,comment = 'C' as is_collection
from data
), simno_seq_map as (
select
simno
,seq8() as seq
from (
select distinct simno
from data
)
), mapped_data as (
select d.*
,seq
from enriched_data as d
join simno_seq_map as m
on d.simno = m.simno
), daily_bitmaps as (
select
created_date,
tenant,
BITMAP_BUCKET_NUMBER(seq) as bit_bucket,
BITMAP_CONSTRUCT_AGG(iff(is_request, BITMAP_BIT_POSITION(seq), null)) as req_bit_bmp,
BITMAP_CONSTRUCT_AGG(iff(is_collection, BITMAP_BIT_POSITION(seq), null)) as coll_bit_bmp
from mapped_data
group by 1,2,3
), data_today as (
select
created_date,
tenant,
sum(r_cnt) as num_requests,
sum(c_cnt) as num_collections
from (
select
created_date,
tenant,
BITMAP_COUNT(BITMAP_OR_AGG(req_bit_bmp)) as r_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(coll_bit_bmp)) as c_cnt
from daily_bitmaps
group by 1,2, bit_bucket
)
group by 1,2
), data_1_7_win as (
select db.*
,dateadd('day', w.v, db.created_date) as window_date
from daily_bitmaps as db
cross join (values (1),(2),(3),(4),(5),(6),(7)) as w(v)
), data_1_7 as (
select
created_date,
tenant,
sum(r_1_7_cnt) as num_requests_1_7,
sum(c_1_7_cnt) as num_collections_1_7
from (
select
d.created_date,
d.tenant,
d.bit_bucket,
BITMAP_COUNT(BITMAP_OR_AGG(dw.req_bit_bmp)) as r_1_7_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(dw.coll_bit_bmp)) as c_1_7_cnt
from daily_bitmaps as d
join data_1_7_win as dw
on d.tenant = dw.tenant
and d.bit_bucket = dw.bit_bucket
and d.created_date = dw.window_date
group by 1,2, d.bit_bucket
)
group by 1,2
), data_1_12_win as (
select db.*
,dateadd('day', w.v, db.created_date) as window_date
from daily_bitmaps as db
cross join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) as w(v)
), data_1_12 as (
select
created_date,
tenant,
sum(r_1_12_cnt) as num_requests_1_12,
sum(c_1_12_cnt) as num_collections_1_12
from (
select
d.created_date,
d.tenant,
d.bit_bucket,
BITMAP_COUNT(BITMAP_OR_AGG(dw.req_bit_bmp)) as r_1_12_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(dw.coll_bit_bmp)) as c_1_12_cnt
from daily_bitmaps as d
join data_1_12_win as dw
on d.tenant = dw.tenant
and d.bit_bucket = dw.bit_bucket
and d.created_date = dw.window_date
group by 1,2, d.bit_bucket
)
group by 1,2
), data_1_25_win as (
select db.*
,dateadd('day', w.v, db.created_date) as window_date
from daily_bitmaps as db
cross join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),
(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25)) as w(v)
), data_1_25 as (
select
created_date,
tenant,
sum(r_1_25_cnt) as num_requests_1_25,
sum(c_1_25_cnt) as num_collections_1_25
from (
select
d.created_date,
d.tenant,
d.bit_bucket,
BITMAP_COUNT(BITMAP_OR_AGG(dw.req_bit_bmp)) as r_1_25_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(dw.coll_bit_bmp)) as c_1_25_cnt
from daily_bitmaps as d
join data_1_25_win as dw
on d.tenant = dw.tenant
and d.bit_bucket = dw.bit_bucket
and d.created_date = dw.window_date
group by 1,2, d.bit_bucket
)
group by 1,2
)
select
dt.*,
d7.num_requests_1_7,
d7.num_collections_1_7,
d12.num_requests_1_12,
d12.num_collections_1_12,
d25.num_requests_1_25,
d25.num_collections_1_25
from data_today as dt
left join data_1_7 as d7
on dt.created_date = d7.created_date
and dt.tenant = d7.tenant
left join data_1_12 as d12
on dt.created_date = d12.created_date
and dt.tenant = d12.tenant
left join data_1_25 as d25
on dt.created_date = d25.created_date
and dt.tenant = d25.tenant
order by 1,2
;
gives:
given I spaced the values out, and they are all distinct id's this makes sense to me.
Add some non-distinct id's:
-- not distinct ids
('2023-12-01'::timestamp, 't2', 'R', 10),
('2023-12-01'::timestamp, 't2', 'C', 20),
('2023-12-01'::timestamp, 't2', 'C', 21),
('2023-11-30'::timestamp, 't2', 'R', 10),
('2023-11-30'::timestamp, 't2', 'R', 11),
('2023-11-30'::timestamp, 't2', 'R', 12),
('2023-11-30'::timestamp, 't2', 'C', 20),
('2023-11-30'::timestamp, 't2', 'C', 21),
('2023-11-30'::timestamp, 't2', 'C', 22),
('2023-11-30'::timestamp, 't2', 'C', 23),
('2023-11-23'::timestamp, 't2', 'R', 10),
('2023-11-23'::timestamp, 't2', 'R', 11),
('2023-11-23'::timestamp, 't2', 'R', 12),
('2023-11-23'::timestamp, 't2', 'R', 13),
('2023-11-23'::timestamp, 't2', 'R', 14),
('2023-11-23'::timestamp, 't2', 'C', 20),
('2023-11-23'::timestamp, 't2', 'C', 21),
('2023-11-23'::timestamp, 't2', 'C', 22),
('2023-11-23'::timestamp, 't2', 'C', 23),
('2023-11-23'::timestamp, 't2', 'C', 24),
('2023-11-23'::timestamp, 't2', 'C', 25),
('2023-11-16'::timestamp, 't2', 'R', 10),
('2023-11-16'::timestamp, 't2', 'R', 11),
('2023-11-16'::timestamp, 't2', 'R', 12),
('2023-11-16'::timestamp, 't2', 'R', 13),
('2023-11-16'::timestamp, 't2', 'R', 14),
('2023-11-16'::timestamp, 't2', 'R', 15),
('2023-11-16'::timestamp, 't2', 'R', 16),
('2023-11-16'::timestamp, 't2', 'C', 20),
('2023-11-16'::timestamp, 't2', 'C', 21),
('2023-11-16'::timestamp, 't2', 'C', 22),
('2023-11-16'::timestamp, 't2', 'C', 23),
('2023-11-16'::timestamp, 't2', 'C', 24),
('2023-11-16'::timestamp, 't2', 'C', 25),
('2023-11-16'::timestamp, 't2', 'C', 26),
('2023-11-16'::timestamp, 't2', 'C', 27)
we get:
which is exactly how I imagined, it would look.
I assume this is about the left joins I used to connect the prior days, while I agree there is no-null in your data, this code relies of this, aka lets remove the LEFT:
from data_today as dt
join data_1_7 as d7
on dt.created_date = d7.created_date
and dt.tenant = d7.tenant
join data_1_12 as d12
on dt.created_date = d12.created_date
and dt.tenant = d12.tenant
join data_1_25 as d25
on dt.created_date = d25.created_date
and dt.tenant = d25.tenant
and now get are missing the 2023-11-16 row... because there was no history for this row:
So from here we can drop the window bucket style and we can drop the bitmap usage..
with data(created, tenant, comment, simno) as (
/* same as above */
), enriched_data as (
select *
,created::date as created_date
,comment = 'R' as is_request
,comment = 'C' as is_collection
from data
), data_today as (
select
created_date,
tenant,
count(distinct iff(is_request, simno, null)) as cnt_r,
count(distinct iff(is_collection, simno, null)) as cnt_c
from enriched_data
group by 1,2
), data_1_7_win as (
select db.*
,dateadd('day', w.v, db.created_date) as window_date
from enriched_data as db
cross join (values (1),(2),(3),(4),(5),(6),(7)) as w(v)
), data_1_7 as (
select
d.created_date,
d.tenant,
count(distinct iff(dw.is_request, dw.simno, null)) as cnt_r,
count(distinct iff(dw.is_collection, dw.simno, null)) as cnt_c
from enriched_data as d
join data_1_7_win as dw
on d.tenant = dw.tenant
and d.created_date = dw.window_date
group by 1,2
), data_1_12_win as (
select db.*
,dateadd('day', w.v, db.created_date) as window_date
from enriched_data as db
cross join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) as w(v)
), data_1_12 as (
select
d.created_date,
d.tenant,
count(distinct iff(dw.is_request, dw.simno, null)) as cnt_r,
count(distinct iff(dw.is_collection, dw.simno, null)) as cnt_c
from enriched_data as d
join data_1_12_win as dw
on d.tenant = dw.tenant
and d.created_date = dw.window_date
group by 1,2
), data_1_25_win as (
select db.*
,dateadd('day', w.v, db.created_date) as window_date
from enriched_data as db
cross join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),
(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25)) as w(v)
), data_1_25 as (
select
d.created_date,
d.tenant,
count(distinct iff(dw.is_request, dw.simno, null)) as cnt_r,
count(distinct iff(dw.is_collection, dw.simno, null)) as cnt_c
from enriched_data as d
join data_1_25_win as dw
on d.tenant = dw.tenant
and d.created_date = dw.window_date
group by 1,2
)
select
dt.*,
d7.cnt_r as r_1_7,
d7.cnt_c as c_1_7,
d12.cnt_r as r_1_12,
d12.cnt_c as c_1_12,
d25.cnt_r as r_1_25,
d25.cnt_c as c_1_25
from data_today as dt
left join data_1_7 as d7
on dt.created_date = d7.created_date
and dt.tenant = d7.tenant
left join data_1_12 as d12
on dt.created_date = d12.created_date
and dt.tenant = d12.tenant
left join data_1_25 as d25
on dt.created_date = d25.created_date
and dt.tenant = d25.tenant
order by 2,1;
gives:
but takes for this tiny data 1.5s instead on 0.8s
So here is the answer you were expecting I believe:
with data(created, tenant, comment, simno) as (
/* also trimmed */
), enriched_data as (
select *
,created::date as created_date
,comment = 'R' as is_request
,comment = 'C' as is_collection
from data
), data_today as (
select
created_date,
tenant,
count(distinct iff(is_request, simno, null)) as cnt_r,
count(distinct iff(is_collection, simno, null)) as cnt_c
from enriched_data
group by 1,2
), data_1_7 as (
select
d.created_date,
d.tenant,
count(distinct iff(dw.is_request, dw.simno, null)) as cnt_r,
count(distinct iff(dw.is_collection, dw.simno, null)) as cnt_c
from enriched_data as d
join enriched_data as dw
on d.tenant = dw.tenant
and d.created_date between dateadd('days', 1, dw.created_date) and dateadd('days', 7, dw.created_date)
group by 1,2
), data_1_12 as (
select
d.created_date,
d.tenant,
count(distinct iff(dw.is_request, dw.simno, null)) as cnt_r,
count(distinct iff(dw.is_collection, dw.simno, null)) as cnt_c
from enriched_data as d
join enriched_data as dw
on d.tenant = dw.tenant
and d.created_date between dateadd('days', 1, dw.created_date) and dateadd('days', 12, dw.created_date)
group by 1,2
), data_1_25 as (
select
d.created_date,
d.tenant,
count(distinct iff(dw.is_request, dw.simno, null)) as cnt_r,
count(distinct iff(dw.is_collection, dw.simno, null)) as cnt_c
from enriched_data as d
join enriched_data as dw
on d.tenant = dw.tenant
and d.created_date between dateadd('days', 1, dw.created_date) and dateadd('days', 25, dw.created_date)
group by 1,2
)
select
dt.*,
d7.cnt_r as r_1_7,
d7.cnt_c as c_1_7,
d12.cnt_r as r_1_12,
d12.cnt_c as c_1_12,
d25.cnt_r as r_1_25,
d25.cnt_c as c_1_25
from data_today as dt
left join data_1_7 as d7
on dt.created_date = d7.created_date
and dt.tenant = d7.tenant
left join data_1_12 as d12
on dt.created_date = d12.created_date
and dt.tenant = d12.tenant
left join data_1_25 as d25
on dt.created_date = d25.created_date
and dt.tenant = d25.tenant
order by 2,1
;
which gives the same results as prior:
On this tiny dataset it performs in the same 1.5s time range on my otherwise idle XTRASMALL warehouse. But when you have the number of rows, I suspect you have they should perform quite differently.