python optimization linear-programming mixed-integer-programming

how can I minimize the distance from a given input distribution?

I have a list of customers and each of them can be "activated" in four different ways:

n= 1000
df = pd.DataFrame(list(range(0,n)), columns = ['Customer_ID'])
df['A'] = np.random.randint(2, size=n)
df['B'] = np.random.randint(2, size=n)
df['C'] = np.random.randint(2, size=n)

each customer can be activated either on "A" or on "B" or on "C" and only if the Boolean related to the type of activation is equal to 1.

In input i have the count of the final activations. es:

Target_A = 500
Target_B = 250
Target_C = 250

The random values in code are an input for the optimizer and represent the possibility or not to activate the client in that way. How can I associate the client with only one of those in order to respect the final targets? How can I minimize the distance between the count of real activation and the input data?

Solution

Do you have any tested examples? I think this might work but not sure:

import pandas as pd
import numpy as np
from pulp import LpProblem, LpVariable, LpMinimize, LpInteger, lpSum, value

prob = LpProblem("problem", LpMinimize)


n= 1000
df = pd.DataFrame(list(range(0,n)), columns = ['Customer_ID'])
df['A'] = np.random.randint(2, size=n)
df['B'] = np.random.randint(2, size=n)
df['C'] = np.random.randint(2, size=n)

Target_A = 500
Target_B = 250
Target_C = 250


A = LpVariable.dicts("A", range(0, n), lowBound=0, upBound=1, cat='Boolean')
B = LpVariable.dicts("B", range(0, n), lowBound=0, upBound=1, cat='Boolean')
C = LpVariable.dicts("C", range(0, n), lowBound=0, upBound=1, cat='Boolean')

O1 = LpVariable("O1", cat='Integer')
O2 = LpVariable("O2", cat='Integer')
O3 = LpVariable("O3", cat='Integer')

#objective
prob += O1 + O2 + O3

#constraints
prob += O1 >= Target_A - lpSum(A)
prob += O1 >= lpSum(A) - Target_A
prob += O2 >= Target_B - lpSum(B)
prob += O2 >= lpSum(B) - Target_B
prob += O3 >= Target_C - lpSum(C)
prob += O3 >= lpSum(C) - Target_C

for idx in range(0, n):
    prob += A[idx] + B[idx] + C[idx] <= 1 #cant activate more than 1
    prob += A[idx] <= df['A'][idx] #cant activate if 0
    prob += B[idx] <= df['B'][idx] 
    prob += C[idx] <= df['C'][idx] 

prob.solve()    

print("difference:", prob.objective.value())