Search code examples
sqlwindow-functionspresto

calculate the 7 day sale by different start date (the first day customer purchase), to find average purchase unit of each customer by every 7 days


From the table,I have the first table now, and trying to get every 7 day sales from the first day of this customer make the purchase. example is table 2

purchase date customer id sales unit
2018-01-01 1 10
2018-01-02 1 5
2018-01-05 2 3
2018-01-15 1 10
2018-01-20 2 4
2018-01-21 2 5
purchase date customer id sales unit every 7 day cumulative sales
2018-01-01 1 10 10
2018-01-02 1 5 15
2018-01-15 1 10 10
2018-01-05 2 3 3
2018-01-20 2 4 9
2018-01-21 2 5 9

The final table shud be like this:

purchase week customer id 7 day sales unit
2018-01-01 1 15
2018-01-05 2 3
2018-01-15 1 10
2018-01-20 2 4

then I can calculate the average sales per customer

customer id average of every 7 day sales unit Calculation
1 12.5 (15+10) /2
2 3.5 (3+4) /2

the hard part is:

  1. every customer have different first day of purchase

  2. The purchase date are not consequent, so I can't use unbonded or following 6 rows etc.

  3. There're 5 years in the whole dataset, so I can't manually -7, -14, etc

  4. I tried to use date_trunc('week',date, min(date) over (partition by customerid))

  5. also tried parition by rows between 6 proceding and current row. But the date are not consequent so doesn't work


Solution

  • You can use SQL window function with 2 steps to get the result you want:

    Step 1. Apply a window parition by per customer and get first_purchase_date for each customer. After that, use Presto date_diff() fucntion to calculate the date difference from first purchase date to current purchase date. Devide it by 7 to get week_bucket from the first date of purchase.

    Step 2. group by per (customer, customer_sale_week_bucket) and get sum(sales_unit) and min(purchase_date) in each (customer, customer_sale_week_bucket) parition.

    Here is the query:

    with orders_with_customer_week_bucket AS 
    (
      select
       purchase_date,
       customer_id,
       sales_unit,
       date_diff(day,min(purchase_date) over (partition by customer_id), purchase_date) / 7 as customer_sale_week_bucket
    from
       orders
     )
     select
         purchase_week,
         customer_id,
         seven_day_sales_unit
     from
         (select
             customer_id,
             customer_sale_week_bucket,
             min(purchase_date) as purchase_week,
             sum(sales_unit) as seven_day_sales_unit
         from
            orders_with_customer_week_bucket
         GROUP BY
            customer_id,
            customer_sale_week_bucket
         )r
    
    purchase_week customer_id seven_day_sales_unit
    2018-01-01 1 15
    2018-01-05 2 3
    2018-01-15 1 10
    2018-01-20 2 9