Search code examples
pythonoptimizationlinear-programmingmixed-integer-programming

how can I minimize the distance from a given input distribution?


I have a list of customers and each of them can be "activated" in four different ways:

n= 1000
df = pd.DataFrame(list(range(0,n)), columns = ['Customer_ID'])
df['A'] = np.random.randint(2, size=n)
df['B'] = np.random.randint(2, size=n)
df['C'] = np.random.randint(2, size=n)

each customer can be activated either on "A" or on "B" or on "C" and only if the Boolean related to the type of activation is equal to 1.

In input i have the count of the final activations. es:

Target_A = 500
Target_B = 250
Target_C = 250

The random values in code are an input for the optimizer and represent the possibility or not to activate the client in that way. How can I associate the client with only one of those in order to respect the final targets? How can I minimize the distance between the count of real activation and the input data?


Solution

  • Do you have any tested examples? I think this might work but not sure:

    import pandas as pd
    import numpy as np
    from pulp import LpProblem, LpVariable, LpMinimize, LpInteger, lpSum, value
    
    prob = LpProblem("problem", LpMinimize)
    
    
    n= 1000
    df = pd.DataFrame(list(range(0,n)), columns = ['Customer_ID'])
    df['A'] = np.random.randint(2, size=n)
    df['B'] = np.random.randint(2, size=n)
    df['C'] = np.random.randint(2, size=n)
    
    Target_A = 500
    Target_B = 250
    Target_C = 250
    
    
    A = LpVariable.dicts("A", range(0, n), lowBound=0, upBound=1, cat='Boolean')
    B = LpVariable.dicts("B", range(0, n), lowBound=0, upBound=1, cat='Boolean')
    C = LpVariable.dicts("C", range(0, n), lowBound=0, upBound=1, cat='Boolean')
    
    O1 = LpVariable("O1", cat='Integer')
    O2 = LpVariable("O2", cat='Integer')
    O3 = LpVariable("O3", cat='Integer')
    
    #objective
    prob += O1 + O2 + O3
    
    #constraints
    prob += O1 >= Target_A - lpSum(A)
    prob += O1 >= lpSum(A) - Target_A
    prob += O2 >= Target_B - lpSum(B)
    prob += O2 >= lpSum(B) - Target_B
    prob += O3 >= Target_C - lpSum(C)
    prob += O3 >= lpSum(C) - Target_C
    
    for idx in range(0, n):
        prob += A[idx] + B[idx] + C[idx] <= 1 #cant activate more than 1
        prob += A[idx] <= df['A'][idx] #cant activate if 0
        prob += B[idx] <= df['B'][idx] 
        prob += C[idx] <= df['C'][idx] 
    
    prob.solve()    
    
    print("difference:", prob.objective.value())