Search code examples
sqlamazon-redshiftwindow-functionslag

Display # of customers per month that have previous Sale date > 3 months and # of these customers that have a "Sale Date" in the given month


Basically, my requirement is - for a given month, how many customers had their "previous Sale date" 3 months before the given month and of these customers how many of them have a "Sale date" in the given month.

I tried using Lag function, but my column "Reactivated_Guests" is giving me null value always.

 SELECT datepart(month,["sale date"]) `"Sale_Month",count(distinct 
["user id"]) "Lost_Guests",
lag("Guests",4) OVER (ORDER BY "Sale_Month")+
lag("Guests",5) OVER (ORDER BY "Sale_Month")+
lag("Guests",6) OVER (ORDER BY "Sale_Month")+
lag("Guests",7) OVER (ORDER BY "Sale_Month")+
lag("Guests",8) OVER (ORDER BY "Sale_Month")+
lag("Guests",9) OVER (ORDER BY "Sale_Month")+
lag("Guests",10) OVER (ORDER BY "Sale_Month")+
lag("Guests",11) OVER (ORDER BY "Sale_Month")+
lag("Guests",12) OVER (ORDER BY "Sale_Month") "Reactivated_Guests"
group by "Sale_Month"
order by "Sale_Month"

My expected output is month-wise # of guests that have their previous "Sale date" greater than 3 months before the given month (Lost_Guests) and of these customers how many have a "Sale date" in the given month (Reactivated_Guests)

Expected Result :

Sale_Month     Lost_Guests                     Reactivated_Guests
              (prev Sale date > 3 months)  (Prev Sale date > 3 months and 
                                            have a Sale date in given month)
  June         1,200                            110
  July         1,800                            130
  Aug          1,900                            140

Actual Result :

Sale_Month     Lost_Guests      Reactivated_Guests

  June         1,200               null
  July         1,800               null
  Aug          1,900               null

Sample Data :

Customer      Sale Date
AAAAA        11/15/2018
BBBBB        11/16/2018
CCCCC         9/23/2018
CCCCC         1/25/2019  
AAAAA         3/16/2019    ----> so for given month of March, AAAAA to be 
CCCCC         3/18/2019          considered in "Lost_Guests"  because 
                                 AAAAA's previous sale date (11/15/2018) is
                                 more than 3 months from the given month 
                                (March - 2019) and AAAAA to be considered in 
                                 "Reactivated_guests" because AAAAA has a 
                                 Sale date in the given month (March-2019) 


                           ----> for given month of March, CCCCC shall not 
                                 be considered in "Lost guests" and
                                 "Reactivated Guests" because 
                                 previous sale date (1/25/2019) is less 
                                 than 3 months from given month (March-2019)
                                 and hence does not appear in 
                                 "Reactivated_Guests" as well

Solution

  • This addresses the original version of the question.

    You seem to want something like this:

    select sale_month, count(distinct user_id) as guests,
           count(distinct case when min_sale_date < sale_date - interval '3 month' then user_id end) as old_guests
    from (select t.*,
                 min(sale_date) over (partition by user_id) as min_sale_date
          from t
         ) t
    group by sale_month
    order by sale_month;
    

    Note that date functions are very database dependent, so the exact syntax might vary depending on your database.