I have below sales data and purchase data and using Hive Rank Over partition, tried to calculate the profit but it's not working.
Note: For same SKU Cost information can be updated multiple times in a day.
For order 1011, there is not entry in the cost table so it should consider the cost which was entered on 2022-05-19 06:50:20.000 as 32.5.
Below is the sample sales data:
Below is the sample cost information:
I was able to figure it and implement using window functions, below is the complete query:
SELECT sku, sum(s.unit_sale_price*s.quantity_sold) as total_sale, sum(s.quantity_sold) total_quantity_sold, sum(s.quantity_sold * e.considered_cost) total_cost, sum((s.unit_sale_price * s.quantity_sold)-(s.quantity_sold * e.considered_cost)) profit FROM sales s inner join ( SELECT sku, lead(update_timestamp,1) over ( partition by sku order by update_timestamp desc ) from_timestamp, update_timestamp to_timestamp, lead(cost,1) over ( partition by sku order by update_timestamp desc ) considered_cost, cost as original_cost, rank() OVER (PARTITION BY sku ORDER BY update_timestamp desc) as rk FROM cost ) e on e.sku = s.sku and s.posted_date between e.from_timestamp and e.to_timestamp group by s.sku
union all
SELECT sku, sum(s.unit_sale_prices.quantity_sold) as total_sale, sum(s.quantity_sold) total_quantity_sold, sum(s.quantity_sold * e.original_cost) total_cost, sum((s.unit_sale_prices.quantity_sold)-(s.quantity_sold * e.original_cost)) profit FROM sales s inner join ( SELECT sku, lead(update_timestamp,1) over ( partition by sku order by update_timestamp desc ) from_timestamp, update_timestamp to_timestamp, lead(cost,1) over ( partition by sku order by update_timestamp desc ) considered_cost, cost as original_cost, rank() OVER (PARTITION BY sku ORDER BY update_timestamp desc) as rk FROM cost ) e on e.sku = s.sku and s.posted_date > e.to_timestamp and rk =1 group by s.sku
union all
SELECT sku, sum(s.unit_sale_prices.quantity_sold) as total_sale, sum(s.quantity_sold) total_quantity_sold, sum(s.quantity_sold * e.original_cost) total_cost, sum((s.unit_sale_prices.quantity_sold)-(s.quantity_sold * e.original_cost)) profit FROM sales s inner join ( SELECT sku, lead(update_timestamp,1) over ( partition by sku order by update_timestamp asc ) from_timestamp, update_timestamp to_timestamp, lead(cost,1) over ( partition by sku order by update_timestamp asc ) considered_cost, cost as original_cost, rank() OVER (PARTITION BY sku ORDER BY update_timestamp asc) as rk FROM cost ) e on e.sku = s.sku and s.posted_date < e.to_timestamp and rk =1 group by s.sku
Explaination:
First I have transposed the cost table into from_timestamp and to_timstamp like below,
SELECT sku, lead(update_timestamp,1) over ( partition by sku order by update_timestamp desc ) from_timestamp, update_timestamp to_timestamp, lead(cost,1) over ( partition by sku order by update_timestamp desc ) considered_cost, cost as original_cost, rank() OVER (PARTITION BY sku ORDER BY update_timestamp desc) as rk FROM cost
sku , from_timestamp, to_timstamp, cosindered_cost, original_cost, rk
then applied inner join with sales table and build the aggregation logic to calculate the profit.