Search code examples
sqlprestotrino

Don't want to double count in Filtered Aggregation


Sample Data:

shopper_id last_purchase_timestamp active_p30 active_p60 active_over_p90
1 2022-03-02 1:20:00 TRUE TRUE TRUE
2 2022-03-01 1:30:00 TRUE TRUE TRUE
3 2022-02-28 1:24:03 TRUE TRUE TRUE
4 2022-02-02 21:22:26 FALSE TRUE TRUE

I want to count if the shopper was active (as in made their last purchase) in the last 30 days (starting march 5th), last 60 days, etc.

My goal is to find how many shoppers bought their last item in the last 30 days, how many shoppers bought their last item in the last 60 days etc. However I do not want to double count.

What I've attempted:

AS total_active_p30,

count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day) 
AS total_active_p60,

count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day) AS 
total_active_p90 

Results:

total_active_p30 total_active_p60 total_active_p90
3 4 4

However this is causing it to double count. How can I prevent it from double counting? The total number of counts should be 4.

My ideal output would be:

total_active_p30 total_active_p60 total_active_p90
3 1 0

Thanks in advance everyone! I'm using Trino!


Solution

  • Your query has an incorrect logic condition. Because data of resulting this >= DATE 2022-03-05 - INTERVAL 90 day condition always are have data of resulting this >= DATE 2022-03-05 - INTERVAL 60 day condition too. For that, we must write our query:

    count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '30' day)) 
    as total_active_p30,
    
    count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '60' day)
                                and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '30' day)) 
    as total_active_p60,
    
    count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '90' day)
                            and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '60' day)) 
    as total_active_p90