Search code examples
pythonoptimizationgurobiamplcoin-or-cbc

AMPL / Python - Performance & Memory issue


I’ve writen a simple model that optimize item location assignment for a warehouse of 30 000 locations.

My code works with 3 items and gurobi or cplex, but :

  • With CBC : Not working after 1 minutes even with only 3 items …
  • With Highs : Error -1 for call Highs_passH
  • With 3000 items and with Gurobi or Cplex : Unexpected end of file while reading AMPL output.

But it's a so simple model, and data are not huge. Can you help me ?

AMPL Model :

set Items;  # Ensembles des articles
set Locations;  # Ensemble des emplacements

param Quantity{Items};  # Coût de chaque allée
param Cost_Location{Locations};  # Capacité maximale des emplacements
param Cost_Move := 1; # Coût de déplacement d'un article
param valid_allocations{Items, Locations} binary;  # Table des allocations valides pour chaque article

var assign{Items, Locations} binary;  # Variables de décision : 1 si l'article est attribué à l'emplacement, 0 sinon
var prev_loc{Items, Locations} binary;  # Variables de décision : 1 si l'article est attribué à l'emplacement, 0 sinon

minimize Total_Cost:
    sum{i in Items, j in Locations} assign[i,j] * (Cost_Location[j] + Cost_Move * prev_loc[i,j]);

subject to Assignment_Constraint{i in Items}:
    sum{j in Locations} assign[i,j] = 1;  

subject to Valid_Allocation_Constraint{i in Items, j in Locations}:
    assign[i,j] <= valid_allocations[i,j];

subject to AtLeastOneAssignment_Constraint{i in Items}:
    sum{j in Locations} assign[i,j] >= 1;

Python Code :

    from amplpy import AMPL
    import polars as pl
    
    
    ampl = AMPL()
    ampl.option["solver"] = "gurobi"
    ampl.read("allocation.model")
    
    items_set = ampl.getSet("Items")
    locations_set = ampl.getSet("Locations")
    quantity_demand_param = ampl.getParameter("Quantity")
    cost_location_param = ampl.getParameter("Cost_Location")
    valid_allocations_param = ampl.getParameter("valid_allocations")
    
    orders = pl.read_csv("./data/orders.csv", separator=';')
    locations = pl.read_csv("./data/locations.csv", separator=';')

#Work only with this filter
    orders = orders.filter((pl.col("ITEM_ID") == 34682) | (pl.col("ITEM_ID") == 34657) | (pl.col("ITEM_ID") == 29840))
    
    items_set.setValues(orders.select("ITEM_ID").unique().to_dict(as_series=False)["ITEM_ID"])
    locations_set.setValues(locations.select("LOCATION_ID").unique().to_dict(as_series=False)["LOCATION_ID"])
    
    
    quantity_demand_param.setValues(dict(orders.unique(subset = "ITEM_ID").select(["ITEM_ID","QTY"]).iter_rows()))
    cost_location_param.setValues(dict(locations.unique(subset = "LOCATION_ID").select(["LOCATION_ID","WALKING_DISTANCE"]).iter_rows()))
    cartesian_locations = orders.select("ITEM_ID").unique().join(locations,how='cross', suffix="_LOCATIONS" )
    cartesian_locations = cartesian_locations.select(pl.col("ITEM_ID"),pl.col("LOCATION_ID"),(pl.col("CART_STD1") | pl.col("CART_STD2") | pl.col("CART_DEMI-HAUT") | pl.col("CART_VOLUMINEUX")))
    cartesian_locations =  cartesian_locations.with_columns(pl.col("CART_STD1").apply(lambda col: int(col)))
    cartesian_locations = cartesian_locations.pivot( values="CART_STD1", index="ITEM_ID", columns="LOCATION_ID",aggregate_function="first")
    cartesian_locations = cartesian_locations.to_pandas()
    cartesian_locations = cartesian_locations.set_index("ITEM_ID").transpose()
    valid_allocations_param.setValues(cartesian_locations.unstack())
    
    ampl.solve()

Solution

  • In this model you have parameters and variables indexed over two sets (locations and items). Since you have 3,000 items and 30,000 locations, this means 90,000,000 variables if all items are available in all locations. This may still be possible to solve with Gurobi or CPLEX on a machine with a substantial amount of memory, but it is way bigger than anything that open-source solvers are able to tackle nowadays.

    To get an idea of the model you do the following to see the size of the data being passed to AMPL:

    print(cartesian_locations.stack().shape)
    

    You can filter this data to only include items available in each location. That will help improving the performance and reduce the memory usage substantially.

    Note that instead of having variables indexed over {Items, Locations}, you should index them over valid allocations to reduce the number of variables. This also avoids the constraint Valid_Allocation_Constraint. Using a set ItemLocations containing the valid allocations you can model as follows:

    set Items;  # Ensembles des articles
    set Locations;  # Ensemble des emplacements
    set ItemLocations within {Items, Locations}; # Valid allocations
    
    param Quantity{Items};  # Coût de chaque allée
    param Cost_Location{Locations};  # Capacité maximale des emplacements
    param Cost_Move := 1; # Coût de déplacement d'un article
    
    var assign{ItemLocations} binary;  # Variables de décision : 1 si l'article est attribué à l'emplacement, 0 sinon
    var prev_loc{ItemLocations} binary;  # Variables de décision : 1 si l'article est attribué à l'emplacement, 0 sinon
    
    minimize Total_Cost:
        sum{(i,j) in ItemLocations} assign[i,j] * (Cost_Location[j] + Cost_Move * prev_loc[i,j]);
    
    subject to Assignment_Constraint{i in Items}:
        sum{(i,j) in ItemLocations} assign[i,j] = 1;
    
    # redundant constraint as Assignment_Constraint enforces that each item is assigned to exactly one location:
    #subject to AtLeastOneAssignment_Constraint{i in Items}:
    #   sum{(i,j) in ItemLocations} assign[i,j] >= 1;
    

    I would also note that even though prev_loc is declared as a variable, it seems to be a parameter indicating whether or not the item is in a given location. You can switch that to a parameter to to reduce the model size eventually to a point where you can use an open-source solver. You can filter to only include assignments under a certain value (i.e., exclude all expensive reallocations). The solution will not guaranteed to be optimal specially if you pick a low threshold, but should still be quite good. You can place in ItemLocations just pairs (item, location) that seem worth it to consider rather than all valid assignments.