Search code examples
sqlprestotrino

What is the best way to dynamically fill values in SQL based on conditions from other columns from previous rows?


This is a problem I would normally do in python, but wanted to see if it could be done efficiently in SQL.

I have a sample forecasted sales dataset that looks like this:

item location_id forecast week_begin pack_size row_number
green_pencil 1234 1.43 12/3/23 6 1
green_pencil 1234 2.67 12/10/23 6 2
green_pencil 1234 0.82 12/17/23 6 3
green_pencil 1234 1.23 12/24/23 6 4
green_pencil 1234 1.23 12/31/23 6 5
green_pencil 1234 1.23 1/7/24 6 6
blue_marker 999 4.31 12/3/23 6 1
blue_marker 999 6.47 12/10/23 6 2
blue_marker 999 6.47 12/17/23 6 3
blue_marker 999 6.47 12/24/23 6 4
blue_marker 999 6.47 12/31/23 6 5
blue_marker 999 6.47 1/7/24 6 6

The data is the forecasted sales units for the next 6 weeks for different items going to different stores. A row number has already been added to each item / location / week. The pack_size columns is how many units get sent to the store when inventory is needed.

The goal is to create two columns units_to_store and week_end_inventory to determine when a store should be sent a pack of an item based on the forecasted sales. The ideal final state dataset is below (just for green_pencil, but would want it for every item / store combo):

item location_id forecast week_begin pack_size row_number units_to_store week_end_inventory
green_pencil 1234 1.43 12/3/23 6 1 6 4.57
green_pencil 1234 2.67 12/10/23 6 2 0 1.9
green_pencil 1234 0.82 12/17/23 6 3 0 1.08
green_pencil 1234 1.23 12/24/23 6 4 6 -0.15
green_pencil 1234 1.23 12/31/23 6 5 0 4.77
green_pencil 1234 1.23 1/7/24 6 6 0 3.54

The first week will always start with the items pack_size in this case, 6 units. Then the goal is to take the forecasted sales and have a running sum until that number gets below 0, in which a new pack should be sent to the stores and the running sum starts again with the pack_size. The week_end_inventory for the first week will be the pack_size - forecast and then the forecast will be subtracted in the next rows / weeks until it gets below 0 and the process will start over again.

What is the best way to do this? I have tried to use the lag function but have trouble with the need for a dynamically filled values.


Solution

  • This is a revised answer (using Postgres, but should work in Presto)

    CREATE TABLE mytable (
          item VARCHAR(200)
        , location_id INTEGER
        , forecast DECIMAL(10, 2)
        , week_begin DATE
        , pack_size INTEGER
        , row_number INTEGER
        );
    
    INSERT INTO mytable (item, location_id, forecast, week_begin, pack_size, row_number) 
    VALUES 
    ('green_pencil', 1234, 0.82, '2023-12-03', 6, 1),
    ('green_pencil', 1234, 0.82, '2023-12-10', 6, 2),
    ('green_pencil', 1234, 0.82, '2023-12-17', 6, 3),
    ('green_pencil', 1234, 1.23, '2023-12-24', 6, 4),
    ('green_pencil', 1234, 1.23, '2023-12-31', 6, 5),
    ('green_pencil', 1234, 1.23, '2024-01-07', 6, 6),
    ('blue_marker', 999, 4.31, '2023-12-03', 6, 1),
    ('blue_marker', 999, 6.47, '2023-12-10', 6, 2),
    ('blue_marker', 999, 6.47, '2023-12-17', 6, 3),
    ('blue_marker', 999, 6.47, '2023-12-24', 6, 4),
    ('blue_marker', 999, 6.47, '2023-12-31', 6, 5),
    ('blue_marker', 999, 6.47, '2024-01-07', 6, 6);
    

    Query: Here I calculate units_to_store based on a simple running sum of forecast, then in the final query week_end_inventory is calculated from the running sum of units_to_store.

    with inventory as (
        SELECT item
            , location_id
            , forecast
            , week_begin
            , pack_size
            , row_number
            , SUM(forecast) OVER (
                PARTITION BY item, location_id 
                ORDER BY week_begin ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                ) AS run_forecast
           , case 
                when row_number = 1 then pack_size
                when SUM(forecast) OVER (
                    PARTITION BY item, location_id 
                    ORDER BY week_begin ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                    )  / pack_size > 1 then pack_size 
                else 0
            end as units_to_store
        FROM mytable
        )
    select
       item, location_id, week_begin, pack_size, forecast, units_to_store
      , sum(units_to_store) OVER (
                PARTITION BY item, location_id 
                ORDER BY week_begin ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                ) - run_forecast AS week_end_inventory
      , row_number, run_forecast
    from inventory
    
    item location_id week_begin pack_size forecast units_to_store week_end_inventory row_number run_forecast
    blue_marker 999 2023-12-03 6 4.31 6 1.69 1 4.31
    blue_marker 999 2023-12-10 6 6.47 6 1.22 2 10.78
    blue_marker 999 2023-12-17 6 6.47 6 0.75 3 17.25
    blue_marker 999 2023-12-24 6 6.47 6 0.28 4 23.72
    blue_marker 999 2023-12-31 6 6.47 6 -0.19 5 30.19
    blue_marker 999 2024-01-07 6 6.47 6 -0.66 6 36.66
    green_pencil 1234 2023-12-03 6 0.82 6 5.18 1 0.82
    green_pencil 1234 2023-12-10 6 0.82 0 4.36 2 1.64
    green_pencil 1234 2023-12-17 6 0.82 0 3.54 3 2.46
    green_pencil 1234 2023-12-24 6 1.23 0 2.31 4 3.69
    green_pencil 1234 2023-12-31 6 1.23 0 1.08 5 4.92
    green_pencil 1234 2024-01-07 6 1.23 6 5.85 6 6.15

    fiddle


    Original (and only partial) approach (assumed MySQL)

    fiddle