What is the best way to dynamically fill values in SQL based on conditions from other columns from previous rows?

This is a problem I would normally do in python, but wanted to see if it could be done efficiently in SQL.

I have a sample forecasted sales dataset that looks like this:

item	location_id	forecast	week_begin	pack_size	row_number
green_pencil	1234	1.43	12/3/23	6	1
green_pencil	1234	2.67	12/10/23	6	2
green_pencil	1234	0.82	12/17/23	6	3
green_pencil	1234	1.23	12/24/23	6	4
green_pencil	1234	1.23	12/31/23	6	5
green_pencil	1234	1.23	1/7/24	6	6
blue_marker	999	4.31	12/3/23	6	1
blue_marker	999	6.47	12/10/23	6	2
blue_marker	999	6.47	12/17/23	6	3
blue_marker	999	6.47	12/24/23	6	4
blue_marker	999	6.47	12/31/23	6	5
blue_marker	999	6.47	1/7/24	6	6

The data is the forecasted sales units for the next 6 weeks for different items going to different stores. A row number has already been added to each item / location / week. The pack_size columns is how many units get sent to the store when inventory is needed.

The goal is to create two columns units_to_store and week_end_inventory to determine when a store should be sent a pack of an item based on the forecasted sales. The ideal final state dataset is below (just for green_pencil, but would want it for every item / store combo):

item	location_id	forecast	week_begin	pack_size	row_number	units_to_store	week_end_inventory
green_pencil	1234	1.43	12/3/23	6	1	6	4.57
green_pencil	1234	2.67	12/10/23	6	2	0	1.9
green_pencil	1234	0.82	12/17/23	6	3	0	1.08
green_pencil	1234	1.23	12/24/23	6	4	6	-0.15
green_pencil	1234	1.23	12/31/23	6	5	0	4.77
green_pencil	1234	1.23	1/7/24	6	6	0	3.54

The first week will always start with the items pack_size in this case, 6 units. Then the goal is to take the forecasted sales and have a running sum until that number gets below 0, in which a new pack should be sent to the stores and the running sum starts again with the pack_size. The week_end_inventory for the first week will be the pack_size - forecast and then the forecast will be subtracted in the next rows / weeks until it gets below 0 and the process will start over again.

What is the best way to do this? I have tried to use the lag function but have trouble with the need for a dynamically filled values.

Solution

This is a revised answer (using Postgres, but should work in Presto)

CREATE TABLE mytable (
      item VARCHAR(200)
    , location_id INTEGER
    , forecast DECIMAL(10, 2)
    , week_begin DATE
    , pack_size INTEGER
    , row_number INTEGER
    );

INSERT INTO mytable (item, location_id, forecast, week_begin, pack_size, row_number) 
VALUES 
('green_pencil', 1234, 0.82, '2023-12-03', 6, 1),
('green_pencil', 1234, 0.82, '2023-12-10', 6, 2),
('green_pencil', 1234, 0.82, '2023-12-17', 6, 3),
('green_pencil', 1234, 1.23, '2023-12-24', 6, 4),
('green_pencil', 1234, 1.23, '2023-12-31', 6, 5),
('green_pencil', 1234, 1.23, '2024-01-07', 6, 6),
('blue_marker', 999, 4.31, '2023-12-03', 6, 1),
('blue_marker', 999, 6.47, '2023-12-10', 6, 2),
('blue_marker', 999, 6.47, '2023-12-17', 6, 3),
('blue_marker', 999, 6.47, '2023-12-24', 6, 4),
('blue_marker', 999, 6.47, '2023-12-31', 6, 5),
('blue_marker', 999, 6.47, '2024-01-07', 6, 6);

Query: Here I calculate units_to_store based on a simple running sum of forecast, then in the final query week_end_inventory is calculated from the running sum of units_to_store.

with inventory as (
    SELECT item
        , location_id
        , forecast
        , week_begin
        , pack_size
        , row_number
        , SUM(forecast) OVER (
            PARTITION BY item, location_id 
            ORDER BY week_begin ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
            ) AS run_forecast
       , case 
            when row_number = 1 then pack_size
            when SUM(forecast) OVER (
                PARTITION BY item, location_id 
                ORDER BY week_begin ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                )  / pack_size > 1 then pack_size 
            else 0
        end as units_to_store
    FROM mytable
    )
select
   item, location_id, week_begin, pack_size, forecast, units_to_store
  , sum(units_to_store) OVER (
            PARTITION BY item, location_id 
            ORDER BY week_begin ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
            ) - run_forecast AS week_end_inventory
  , row_number, run_forecast
from inventory

item	location_id	week_begin	pack_size	forecast	units_to_store	week_end_inventory	row_number	run_forecast
blue_marker	999	2023-12-03	6	4.31	6	1.69	1	4.31
blue_marker	999	2023-12-10	6	6.47	6	1.22	2	10.78
blue_marker	999	2023-12-17	6	6.47	6	0.75	3	17.25
blue_marker	999	2023-12-24	6	6.47	6	0.28	4	23.72
blue_marker	999	2023-12-31	6	6.47	6	-0.19	5	30.19
blue_marker	999	2024-01-07	6	6.47	6	-0.66	6	36.66
green_pencil	1234	2023-12-03	6	0.82	6	5.18	1	0.82
green_pencil	1234	2023-12-10	6	0.82	0	4.36	2	1.64
green_pencil	1234	2023-12-17	6	0.82	0	3.54	3	2.46
green_pencil	1234	2023-12-24	6	1.23	0	2.31	4	3.69
green_pencil	1234	2023-12-31	6	1.23	0	1.08	5	4.92
green_pencil	1234	2024-01-07	6	1.23	6	5.85	6	6.15

fiddle

Original (and only partial) approach (assumed MySQL)

fiddle