python algorithm linear-programming job-scheduling or-tools

Scheduling optimization to minimize the number of timeslots (with constraints)

I'm working on a scheduling optimization problem where we have a set of tasks that need to be completed within a certain timeframe.

Each task has a schedule that specifies a list of time slots when it can be performed. The schedule for each task can be different depending on the weekday.

Here is small sample (reduced number of tasks and time slots):

task_availability_map = {
    "T1" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T2" : [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T3" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T4" : [0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T5" : [0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T6" : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
    "T7" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0],
    "T8" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0],
    "T9" : [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
    "T10": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
}

The constraint is that only up to N tasks can be performed in parallel within the same time slot (if they overlap). The group of parallel tasks always takes the same amount of time regardless of whether 1 or N are being done.

The objective is to minimize the number of time slots.

I've tried a brute force approach that generates all time slot index permutations. For each index in a given permutation, get all tasks that can be scheduled and add them to a list of tasks to be excluded in the next iteration. Once all iterations for a given permutation are completed, add the number of time slots and the combination of indices to a list.

def get_tasks_by_timeslot(timeslot, tasks_to_exclude):
    for task in task_availability_map.keys():
        if task in tasks_to_exclude:
            continue
        if task_availability_map[task][timeslot] == 1:
            yield task

total_timeslot_count = len(task_availability_map.values()[0]) # 17
timeslot_indices = range(total_timeslot_count)

timeslot_index_permutations = list(itertools.permutations(timeslot_indices))

possible_schedules = []

for timeslot_variation in timeslot_index_permutations:
    tasks_already_scheduled = []
    current_schedule = []
    for t in timeslot_variation:
        tasks = list(get_tasks_by_timeslot(t, tasks_already_scheduled))
        if len(tasks) == 0:
            continue
        elif len(tasks) > MAX_PARALLEL_TASKS:
            break
        tasks_already_scheduled += tasks
        current_schedule.append(tasks)

    time_slot_count = np.sum([len(t) for t in current_schedule])
    possible_schedules.append([time_slot_count, timeslot_variation])

...

Sort possible schedules by number of time slots, and that's the solution. However, this algorithm grows in complexity exponentially with the number of time slots. Given there are hundreds of tasks and hundreds of time slots, I need a different approach.

Someone suggested LP MIP (such as Google OR Tools), but I'm not very familiar with it and am having a hard time formulating the constraints in code. Any help with either LP or some other solution that can help me get started in the right direction is much appreciated (doesn't have to be Python, can even be Excel).

Solution

My proposal for a MIP model:

Introduce binary variables:

x(i,t) = 1 if task i is assigned to slot t
         0 otherwise

y(t) = 1 if slot t has at least one task assigned to it
       0 otherwise

Furthermore let:

N = max number of tasks per slot
ok(i,t) = 1 if we are allowed to assign task i to slot t
          0 otherwise

Then the model can look like:

minimize sum(t,y(t))                    (minimize used slots)    
sum(t, ok(i,t)*x(i,t)) = 1   for all i  (each task is assigned to exactly one slot)      
sum(i, ok(i,t)*x(i,t)) <= N  for all t  (capacity constraint for each slot)
y(t) >= x(i,t)  for all (i,t) such that ok(i,t)=1
x(i,t),y(t) in {0,1}                    (binary variables)

Using N=3, I get a solution like:

----     45 VARIABLE x.L  assignment

                s5          s6          s7         s13

task1                    1.000
task2                                1.000
task3                    1.000
task4                    1.000
task5        1.000
task6                                            1.000
task7                                            1.000
task8                                            1.000
task9                                1.000
task10                               1.000

The model is fairly simple and it should not be very difficult to code and solve it using your favorite MIP solver. The one thing you want to make sure is that only variables x(i,t) exist when ok(i,t)=1. In other words, make sure that variables do not appear in the model when ok(i,t)=0. It can help to interpret the assignment constraints as:

sum(t | ok(i,t)=1, x(i,t)) = 1   for all i  (each task is assigned to exactly one slot)      
sum(i | ok(i,t)=1, x(i,t)) <= N  for all t  (capacity constraint for each slot)

where | means 'such that' or 'where'. If you do this right, your model should have 50 variables x(i,t) instead of 10 x 17 = 170. Furthermore we can relax y(t) to be continuous between 0 and 1. It will be either 0 or 1 automatically. Depending on the solver that may affect performance.

I have no reason to believe this is easier to model as a constraint programming model or that it is easier to solve that way. My rule of thumb is, if it is easy to model as a MIP stick to a MIP. If we need to go through lots of hoops to make it a proper MIP, and a CP formulation makes life easier, then use CP. In many cases this simple rule works quite well.