This is a problem I would normally do in python, but wanted to see if it could be done efficiently in SQL.
I have a sample forecasted sales dataset that looks like this:
item | location_id | forecast | week_begin | pack_size | row_number |
---|---|---|---|---|---|
green_pencil | 1234 | 1.43 | 12/3/23 | 6 | 1 |
green_pencil | 1234 | 2.67 | 12/10/23 | 6 | 2 |
green_pencil | 1234 | 0.82 | 12/17/23 | 6 | 3 |
green_pencil | 1234 | 1.23 | 12/24/23 | 6 | 4 |
green_pencil | 1234 | 1.23 | 12/31/23 | 6 | 5 |
green_pencil | 1234 | 1.23 | 1/7/24 | 6 | 6 |
blue_marker | 999 | 4.31 | 12/3/23 | 6 | 1 |
blue_marker | 999 | 6.47 | 12/10/23 | 6 | 2 |
blue_marker | 999 | 6.47 | 12/17/23 | 6 | 3 |
blue_marker | 999 | 6.47 | 12/24/23 | 6 | 4 |
blue_marker | 999 | 6.47 | 12/31/23 | 6 | 5 |
blue_marker | 999 | 6.47 | 1/7/24 | 6 | 6 |
The data is the forecasted sales units for the next 6 weeks for different items going to different stores. A row number has already been added to each item / location / week. The pack_size
columns is how many units get sent to the store when inventory is needed.
The goal is to create two columns units_to_store
and week_end_inventory
to determine when a store should be sent a pack of an item based on the forecasted sales. The ideal final state dataset is below (just for green_pencil, but would want it for every item / store combo):
item | location_id | forecast | week_begin | pack_size | row_number | units_to_store | week_end_inventory |
---|---|---|---|---|---|---|---|
green_pencil | 1234 | 1.43 | 12/3/23 | 6 | 1 | 6 | 4.57 |
green_pencil | 1234 | 2.67 | 12/10/23 | 6 | 2 | 0 | 1.9 |
green_pencil | 1234 | 0.82 | 12/17/23 | 6 | 3 | 0 | 1.08 |
green_pencil | 1234 | 1.23 | 12/24/23 | 6 | 4 | 6 | -0.15 |
green_pencil | 1234 | 1.23 | 12/31/23 | 6 | 5 | 0 | 4.77 |
green_pencil | 1234 | 1.23 | 1/7/24 | 6 | 6 | 0 | 3.54 |
The first week will always start with the items pack_size
in this case, 6 units. Then the goal is to take the forecasted sales and have a running sum until that number gets below 0, in which a new pack should be sent to the stores and the running sum starts again with the pack_size
. The week_end_inventory
for the first week will be the pack_size
- forecast
and then the forecast
will be subtracted in the next rows / weeks until it gets below 0 and the process will start over again.
What is the best way to do this? I have tried to use the lag
function but have trouble with the need for a dynamically filled values.
This is a revised answer (using Postgres, but should work in Presto)
CREATE TABLE mytable (
item VARCHAR(200)
, location_id INTEGER
, forecast DECIMAL(10, 2)
, week_begin DATE
, pack_size INTEGER
, row_number INTEGER
);
INSERT INTO mytable (item, location_id, forecast, week_begin, pack_size, row_number)
VALUES
('green_pencil', 1234, 0.82, '2023-12-03', 6, 1),
('green_pencil', 1234, 0.82, '2023-12-10', 6, 2),
('green_pencil', 1234, 0.82, '2023-12-17', 6, 3),
('green_pencil', 1234, 1.23, '2023-12-24', 6, 4),
('green_pencil', 1234, 1.23, '2023-12-31', 6, 5),
('green_pencil', 1234, 1.23, '2024-01-07', 6, 6),
('blue_marker', 999, 4.31, '2023-12-03', 6, 1),
('blue_marker', 999, 6.47, '2023-12-10', 6, 2),
('blue_marker', 999, 6.47, '2023-12-17', 6, 3),
('blue_marker', 999, 6.47, '2023-12-24', 6, 4),
('blue_marker', 999, 6.47, '2023-12-31', 6, 5),
('blue_marker', 999, 6.47, '2024-01-07', 6, 6);
Query: Here I calculate units_to_store based on a simple running sum of forecast, then in the final query week_end_inventory is calculated from the running sum of units_to_store.
with inventory as (
SELECT item
, location_id
, forecast
, week_begin
, pack_size
, row_number
, SUM(forecast) OVER (
PARTITION BY item, location_id
ORDER BY week_begin ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS run_forecast
, case
when row_number = 1 then pack_size
when SUM(forecast) OVER (
PARTITION BY item, location_id
ORDER BY week_begin ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) / pack_size > 1 then pack_size
else 0
end as units_to_store
FROM mytable
)
select
item, location_id, week_begin, pack_size, forecast, units_to_store
, sum(units_to_store) OVER (
PARTITION BY item, location_id
ORDER BY week_begin ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) - run_forecast AS week_end_inventory
, row_number, run_forecast
from inventory
item | location_id | week_begin | pack_size | forecast | units_to_store | week_end_inventory | row_number | run_forecast |
---|---|---|---|---|---|---|---|---|
blue_marker | 999 | 2023-12-03 | 6 | 4.31 | 6 | 1.69 | 1 | 4.31 |
blue_marker | 999 | 2023-12-10 | 6 | 6.47 | 6 | 1.22 | 2 | 10.78 |
blue_marker | 999 | 2023-12-17 | 6 | 6.47 | 6 | 0.75 | 3 | 17.25 |
blue_marker | 999 | 2023-12-24 | 6 | 6.47 | 6 | 0.28 | 4 | 23.72 |
blue_marker | 999 | 2023-12-31 | 6 | 6.47 | 6 | -0.19 | 5 | 30.19 |
blue_marker | 999 | 2024-01-07 | 6 | 6.47 | 6 | -0.66 | 6 | 36.66 |
green_pencil | 1234 | 2023-12-03 | 6 | 0.82 | 6 | 5.18 | 1 | 0.82 |
green_pencil | 1234 | 2023-12-10 | 6 | 0.82 | 0 | 4.36 | 2 | 1.64 |
green_pencil | 1234 | 2023-12-17 | 6 | 0.82 | 0 | 3.54 | 3 | 2.46 |
green_pencil | 1234 | 2023-12-24 | 6 | 1.23 | 0 | 2.31 | 4 | 3.69 |
green_pencil | 1234 | 2023-12-31 | 6 | 1.23 | 0 | 1.08 | 5 | 4.92 |
green_pencil | 1234 | 2024-01-07 | 6 | 1.23 | 6 | 5.85 | 6 | 6.15 |
Original (and only partial) approach (assumed MySQL)