Search code examples
pythonmathematical-optimizationmodelingpulp

Split variables into groups, each constrained to hold a specific number of variables, while optimizing group sums towards specific values


I have a number of variables each assigned an integer value. I need to split these variables in three groups with a predefined number of variables going into each group while optimizing towards predefined sums of the values in each group. Each group sum should be as close as possible to the predefined value, but can be above or below. All variables should be used and each variable can only be used once.

For example, I might have 10 variables...

Variable Value
A1 98
A2 20
A3 30
A4 50
A5 18
A6 34
A7 43
A8 21
A9 32
A10 54

...and the goal could be to create three groups:

Group #Variables Sum optimized towards
X 6 200
Y 2 100
Z 2 100

So group X should hold 6 variables and their sums should be as close as possible to 200 - but I need to optimize for each of the groups simultanously.

I've tried to set up PuLP to perform this task. I seem to have found a solution for creating a single group, but I cannot figure out how to split the variables into groups and optimize the assignments based on the sums for each group. Is there a way to do this?

Below is my code for producing the first group with the presented variables.

from pulp import LpMaximize, LpMinimize, LpProblem, lpSum, LpVariable, PULP_CBC_CMD, value, LpStatus

keys = ["A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10"]
data = [98,20,30,50,20,34,43,21,32,54]

problem_name = 'repex'

prob = LpProblem(problem_name, LpMaximize)

optiSum = 200 # Optimize towards this sum
variableCount = 6 # Number of variables that should be in the group

# Create decision variables
decision_variables = []
for i,n in enumerate(data):
    variable = i
    variable = LpVariable(str(variable), lowBound = 0, upBound = 1, cat= 'Binary')
    decision_variables.append(variable)


# Add constraints
sumConstraint = "" # Constraint on sum of data elements
for i, n in enumerate(decision_variables):
    formula = data[i]*n
    sumConstraint += formula

countConstraint = "" # Constrain on number of elements used
for i, n in enumerate(decision_variables):
        formula = n
        countConstraint += formula

prob += (sumConstraint <= optiSum)
prob += (countConstraint == variableCount)
prob += sumConstraint

# Solve
optimization_result = prob.solve(PULP_CBC_CMD(msg=0))
prob.writeLP(problem_name + ".lp" )
print("Status:", LpStatus[prob.status])
print("Optimal Solution to the problem: ", value(prob.objective))
print ("Individual decision_variables: ")
for v in prob.variables():
    print(v.name, "=", v.varValue)

Which produces the following output:

Status: Optimal
Optimal Solution to the problem:  200.0
Individual decision_variables:
0 = 0.0
1 = 1.0
2 = 0.0
3 = 1.0
4 = 0.0
5 = 1.0
6 = 1.0
7 = 1.0
8 = 1.0
9 = 0.0

Solution

  • This seems to be a fairly standard "assignment" problem.

    Let z_ij be a set of binary variable representing if object i is assigned to group j.

    Your objective then is to minimise the absolute value of deviations of the group-sums from their target values - working example code below:

    from pulp import LpMaximize, LpMinimize, LpProblem, lpSum, LpVariable, PULP_CBC_CMD, value, LpStatus
    
    data = [98,20,30,50,20,34,43,21,32,54]
    
    n_object = len(data)
    #object_keys = ["A" + str(i) for i in range(1, n_object + 1)]
    object_keys = range(n_object)
    
    group_sum_targets = [200, 100, 100]
    group_n_objects = [6, 2, 2]
    
    n_group = len(group_sum_targets)
    group_keys = range(n_group)
    
    problem_name = 'repex'
    
    # Seek to minimise absolute deviation from the target sums
    prob = LpProblem(problem_name, LpMinimize)
    
    # Primary Decision variables - the assignments
    z = LpVariable.dicts('z',
                         indexs = [(i, j) for i in object_keys for j in group_keys],
                         cat='Binary')
    
    # Aux. decision variables
    group_sums = LpVariable.dicts('group_sums', indexs=group_keys,
                                  cat='Continuous')
    group_abs_error = LpVariable.dicts('group_abs_error', indexs=group_keys,
                                       cat='Continuous')
    
    # Objective - assumes all groups evenly penalised for missing
    # their target sum, and penalty for 'over' and 'under' have same
    # weighting
    prob += lpSum([group_abs_error[j] for j in group_keys])
    
    # Constraints on groups
    for j in group_keys:
        prob += group_sums[j] == lpSum([z[(i, j)]*data[i] for i in object_keys])
        prob += group_abs_error[j] >= group_sums[j] - group_sum_targets[j]
        prob += group_abs_error[j] >= group_sum_targets[j] - group_sums[j]
    
        # Constrain number of objects used
        prob += lpSum([z[(i, j)] for i in object_keys]) == group_n_objects[j]
    
    # Constraints of objects
    for i in object_keys:
        # Every object used exactly once
        prob += lpSum([z[(i, j)] for j in group_keys]) == 1
    
    # Solve
    optimization_result = prob.solve(PULP_CBC_CMD(msg=0))
    
    
    print("Status:", LpStatus[prob.status])
    print("Optimal Solution to the problem: ", value(prob.objective))
    print ("Individual decision_variables: ")
    for v in prob.variables():
        print(v.name, "=", v.varValue)
    

    Which gives me the following (only printing the non-0 z's). As you can see groups have 6, 2, 2 objects as desired, and the sums are somewhat close to the targets.

    Status: Optimal
    Optimal Solution to the problem:  34.0
    Individual decision_variables:
    group_abs_error_0 = 9.0
    group_abs_error_1 = 18.0
    group_abs_error_2 = 7.0
    group_sums_0 = 191.0
    group_sums_1 = 118.0
    group_sums_2 = 93.0
    z_(0,_1) = 1.0
    z_(1,_0) = 1.0
    z_(2,_0) = 1.0
    z_(3,_2) = 1.0
    z_(4,_1) = 1.0
    z_(5,_0) = 1.0
    z_(6,_2) = 1.0
    z_(7,_0) = 1.0
    z_(8,_0) = 1.0
    z_(9,_0) = 1.0