Search code examples
pythonoptimizationpulp

How to identify optimal item prices given a list of quantities and orders?


I have a set of orders where I have the total order value and item quantities but not the cost of individual items. I'm trying to back into what item prices would best explain these totals given the quantity for each order. Note that there are hundreds of items and thousands of orders but I've provided an example problem below.

So far I've tried creating a linear model to find weights but I think a direct optimization model may be a better approach so I've started using pulp to find item prices that best explain these totals but am running into some issues for formulating the optimization problem...

Order ID Order Total Item 1 Quantity Item 2 Quantity Item 3 Quantity
1 $15.00 1 0 2
2 $23.30 0 3 1
3 $17.50 2 0 0
4 $27.25 0 1 1

I'm creating LpVariables as:

itemPrices = [
    pulp.LpVariable(f"Item-{mi}", 0) for mi in sMenuItemIds
]


# Objective
prob += pulp.lpSum(
    itemVars
)

# ABS()
for _, frow in orderDataFrame.iterrows():
    prob += np.sum(frow[mi] * itemVars[sItemIndex[mi]] for mi in sMenuItemIds) >= frow['order_total']
    prob += np.sum(frow[mi] * itemVars[sItemIndex[mi]] for mi in sMenuItemIds) >=  -1 * frow['order_total']

When I run this problem all item values are set to $0.0 so I'm not accurately formulating the optimization problem. I'm trying to minimize the difference of SUM(Item Quantity * Item Price) and the order_total. Any suggestions on how to better approach this problem?


Solution

  • Here is a formulation and a few ideas that might help.

    I popped your data into a .csv file and removed the "$" to make this work. Realize that the case you gave is infeasible (there are no EXACT prices that make those 4 equations work). The model should be formulated to allow for slack using the constraint as shown such that the expenditure is at least the Order Total and let the model try to minimize the prices.

    I saw that you were thinking about ABS() for a difference maybe? I took the approach that the prices had to at least cover the order cost, so ABS() difference shouldn't need to be considered, unless there is more info you have.

    I also monkeyed with an alternate objective (shown) to look at the "grand total" and minimize that, which comes out the same, and I think that is provably always the case, but I'd need to think on it... :)

    I included a little checker at the end to see how well we did overall. If the records are legit/solvable, the model should come up with the same as the order total, it is a straightforward Linear Program.

    Code:

    import pandas as pd
    import numpy as np
    import pulp
    
    df = pd.read_csv('pricer_data.csv')
    df = df.set_index('Order ID')
    
    prob = pulp.LpProblem('pricer')
    
    # cheater for item ids... you could do something more creative here...
    items = list(range(1,4))
    
    # I find this easier (and clearer than making a list of vars...)
    prices = pulp.LpVariable.dicts('price', items, lowBound=0)
    
    # Objective
    # This objective will give you the lowest sum of PRICES 
    prob += pulp.lpSum(prices)
    
    # lowest total cost objective
    # prob += pulp.lpSum(df.iloc[r][f'Item {i} Quantity'] * prices[i] for i in items for r in range(len(df)))
    
    # Constraints
    for _, row in df.iterrows():
        prob += pulp.lpSum(row[f'Item {i} Quantity'] * prices[i] for i in items ) >= row['Order Total']
    
    print(prob)
    soln = prob.solve()
    print(soln)
    for item in prices.keys():
        print(f'item {item} price: {prices[item].varValue}')
    
    
    # let's check the grand total...
    grand_total = sum(df['Order Total'])
    
    # make a quantity matrix
    qty_matrix = df.reset_index().drop(columns=['Order ID', 'Order Total']).to_numpy()
    print(qty_matrix)
    
    price_array = np.array([prices[i].varValue for i in prices.keys()])
    order_costs = qty_matrix.dot(price_array.T)
    print(order_costs)
    
    model_total_cost = np.sum(order_costs)
    print(f'total from records: {grand_total}')
    print(f'model fit costs: {model_total_cost}')
    

    Output (unabridged... ):

    pricer:
    MINIMIZE
    1*price_1 + 1*price_2 + 1*price_3 + 0
    SUBJECT TO
    _C1: price_1 + 2 price_3 >= 15
    
    _C2: 3 price_2 + price_3 >= 23.3
    
    _C3: 2 price_1 >= 17.5
    
    _C4: price_2 + price_3 >= 27.25
    
    VARIABLES
    price_1 Continuous
    price_2 Continuous
    price_3 Continuous
    
    Welcome to the CBC MILP Solver 
    Version: 2.10.3 
    Build Date: Dec 15 2019 
    
    command line - /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pulp/solverdir/cbc/osx/64/cbc /var/folders/rt/5gc2md7s7y5bw8ddt74tz5vw0000gp/T/ec6ca9afa2b143fe845f38050ccae992-pulp.mps timeMode elapsed branch printingOptions all solution /var/folders/rt/5gc2md7s7y5bw8ddt74tz5vw0000gp/T/ec6ca9afa2b143fe845f38050ccae992-pulp.sol (default strategy 1)
    At line 2 NAME          MODEL
    At line 3 ROWS
    At line 9 COLUMNS
    At line 20 RHS
    At line 25 BOUNDS
    At line 26 ENDATA
    Problem MODEL has 4 rows, 3 columns and 7 elements
    Coin0008I MODEL read with 0 errors
    Option for timeMode changed from cpu to elapsed
    Presolve 3 (-1) rows, 3 (0) columns and 6 (-1) elements
    0  Obj 8.75 Primal inf 38.141664 (3)
    2  Obj 36
    Optimal - objective value 36
    After Postsolve, objective 36, infeasibilities - dual 0 (0), primal 0 (0)
    Optimal objective 36 - 2 iterations time 0.002, Presolve 0.00
    Option for printingOptions changed from normal to all
    Total time (CPU seconds):       0.00   (Wallclock seconds):       0.00
    
    1
    item 1 price: 8.75
    item 2 price: 24.125
    item 3 price: 3.125
    [[1 0 2]
     [0 3 1]
     [2 0 0]
     [0 1 1]]
    [15.   75.5  17.5  27.25]
    total from records: 83.05
    model fit costs: 135.25