From the table,I have the first table now, and trying to get every 7 day sales from the first day of this customer make the purchase. example is table 2
purchase date | customer id | sales unit |
---|---|---|
2018-01-01 | 1 | 10 |
2018-01-02 | 1 | 5 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 4 |
2018-01-21 | 2 | 5 |
purchase date | customer id | sales unit | every 7 day cumulative sales |
---|---|---|---|
2018-01-01 | 1 | 10 | 10 |
2018-01-02 | 1 | 5 | 15 |
2018-01-15 | 1 | 10 | 10 |
2018-01-05 | 2 | 3 | 3 |
2018-01-20 | 2 | 4 | 9 |
2018-01-21 | 2 | 5 | 9 |
The final table shud be like this:
purchase week | customer id | 7 day sales unit |
---|---|---|
2018-01-01 | 1 | 15 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 4 |
then I can calculate the average sales per customer
customer id | average of every 7 day sales unit | Calculation |
---|---|---|
1 | 12.5 | (15+10) /2 |
2 | 3.5 | (3+4) /2 |
the hard part is:
every customer have different first day of purchase
The purchase date are not consequent, so I can't use unbonded or following 6 rows etc.
There're 5 years in the whole dataset, so I can't manually -7, -14, etc
I tried to use date_trunc('week',date, min(date) over (partition by customerid))
also tried parition by rows between 6 proceding and current row. But the date are not consequent so doesn't work
You can use SQL window function with 2 steps to get the result you want:
Step 1. Apply a window parition by per customer and get first_purchase_date for each customer. After that, use Presto date_diff() fucntion to calculate the date difference from first purchase date to current purchase date. Devide it by 7 to get week_bucket from the first date of purchase.
Step 2. group by per (customer, customer_sale_week_bucket) and get sum(sales_unit) and min(purchase_date) in each (customer, customer_sale_week_bucket) parition.
Here is the query:
with orders_with_customer_week_bucket AS
(
select
purchase_date,
customer_id,
sales_unit,
date_diff(day,min(purchase_date) over (partition by customer_id), purchase_date) / 7 as customer_sale_week_bucket
from
orders
)
select
purchase_week,
customer_id,
seven_day_sales_unit
from
(select
customer_id,
customer_sale_week_bucket,
min(purchase_date) as purchase_week,
sum(sales_unit) as seven_day_sales_unit
from
orders_with_customer_week_bucket
GROUP BY
customer_id,
customer_sale_week_bucket
)r
purchase_week | customer_id | seven_day_sales_unit |
---|---|---|
2018-01-01 | 1 | 15 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 9 |