Minimize number of unallocated tasks in PuLP

I'm new to linear programming and am trying to use PuLP to allocate tasks to volunteers. I've generated some fake data below. In this fake dataset, each of 50 volunteers (numbered V1 to V50) were given a list of 100 tasks (numbered T1 to T100). They were asked to indicate which 7 tasks they were most willing to take up, and how many tasks they wanted to take up (they are allowed to take up to 3 tasks). Each task was also assigned a number between 1 to 3 which indicates how time-consuming the task is.

I want to allocate the tasks to the volunteers in a way that ensures everyone gets a task, and that they don't get too many tasks / too heavy a workload. In my initial code, I also added a constraint that all tasks should be assigned at least one volunteer. However, with my real dataset (and also in this test dataset), I keep getting an infeasible solution because there aren't enough volunteers to cover all the tasks. I looked at other answers which suggested that I could loosen the constraint by adding a penalty. How do I either penalize unassigned tasks, or change my objective statement to minimize the number of unassigned tasks? My current objective statement just maximizes the number of assignments instead.

### CREATING DATA

# Setting parameters
np.random.seed(42)
num_tasks = 100
num_volunteers = 50

# Creating list of 100 tasks numbered T1 to T100
tasks = [f"T{i}" for i in range(1,num_tasks + 1)]

# Each task is labelled with a number 1 - 3 that indicates estimated time taken to complete
task_load = dict(zip(tasks, [np.random.choice([1,2,3]) for i in range(num_tasks)]))

# Creating list of 50 volunteers numbered V1 to V50
volunteers = [f"V{i}" for i in range(1,num_volunteers+1)]

# Each volunteer is asked to choose 7 tasks to be assigned
volunteer_choices = dict(zip(volunteers, [list(np.random.choice(tasks, size = 7)) for i in range(num_volunteers)]))

# Each volunteer can choose to take between 1 - 3 tasks
volunteer_max_tasks = dict(zip(volunteers, [np.random.choice([1, 2, 3]) for i in range(num_volunteers)]))


### DEFINING MODEL

# Define model
model = LpProblem(name = "resource-allocation", sense = LpMaximize)

# Define decision variables
variables = LpVariable.dicts("Pair", (volunteers, tasks), 0, None, LpBinary)

# Set list of all possible pairs
pairs = [(v, t) for t in tasks for v in volunteers]

# Set objective
model += (lpSum([variables[v][t] for (v, t) in pairs]))


### SETTING CONSTRAINTS

# All volunteers must be assigned at least one task
for v in volunteers:
    model += (lpSum([variables[v][t] for t in tasks]) >= 1)

# Volunteers cannot be assigned to more tasks than they are willing to take on
for v in volunteers:
    model += (lpSum([variables[v][t] for t in tasks]) <= volunteer_max_tasks[v])

# Volunteers cannot be assigned too high a work load
for v in volunteers:
    model += (lpSum([variables[v][t] * task_load[t] for t in tasks]) <= 6)
    
# Volunteers cannot be assigned a task they didn't choose
for v in volunteers:
    for t in tasks:
        if not (t in volunteer_choices[v]):
            model += (variables[v][t] == 0)
    
# All tasks must get a volunteer (CAN I LOOSEN THIS?)
for t in tasks:
    model += (lpSum([variables[v][t] for v in volunteers]) >= 1)

I have to reiterate that I'm extremely new to linear programming. I'm not much of a programmer (just someone from an understaffed NGO with a bit of self-taught Python knowledge) and I started looking into PuLP specifically to solve simple task allocation problems like this. So I hope in your answer, instead of "narrating" the solution (e.g. "Create a variable for X and then include it in your objective function") you can write out the code instead, if possible🙏? Some questions that seem related are this and this? But I was unable to figure out how to adapt it to my problem.

I originally also had a last constraint which limits the number of volunteers assigned to each task to 3, but I removed it in an attempt to massage out a feasible solution.

# Each task cannot be assigned more than 3 volunteers
for t in tasks:
    model += (lpSum([variables[v][t] for v in volunteers]) <= 3)

CLARIFICATIONS ON OBJECTIVES: We generally want to ensure that every volunteer who signed up for the program gets a task! For context, examples of programs that require task allocation may include outreach programs with schools, where every student is tasked to work on small project(s) related to our cause so that they can learn more about it. As such, we don't have so much concerns about efficiency (getting as little workers to do as many tasks as possible), so much as ensuring every student gets a task in a way that minimizes the number of unassigned tasks. I hope that makes a little more sense with context...

Solution

Give this a whirl...

A couple pointers. Your code was close, but you cannot force each task to be assigned once without some guarantee that is feasible.... that is where your problem was.

I think the thing you were looking for is the extra variable and constraint that I put in to count tasks that are "covered", which is separate from pairs. That allows the objective to maximize "covered" tasks (same as minimizing unallocated tasks). There is a linking constraint to support that.

I also added a little "sugar" to the obj function to incentivize making more pairs, even if they are not needed to cover tasks.

This solves for 1000 tasks and 500 volunteers in less than 10 secs. If you have numbers that are much larger than that (unlikely!) there are probably a few enhancements that could be tweaked later.

Also, numpy not needed... and you had lots of unnecessary parens and list constructions (if you do sum or LpSum you can just put the stuff in parens (generator) and no need to make a list comprehension.)

Code:

from pulp import *
from random import choice, sample

### CREATING DATA

# Setting parameters

num_tasks = 10
num_volunteers = 7

# Creating list of 100 tasks numbered T1 to T100
tasks = [f"T{i}" for i in range(1,num_tasks + 1)]

# Each task is labelled with a number 1 - 3 that indicates estimated time taken to complete
task_load = dict(zip(tasks, [choice([1,2,3]) for i in range(num_tasks)]))

# Creating list of 50 volunteers numbered V1 to V50
volunteers = [f"V{i}" for i in range(1,num_volunteers+1)]

# Each volunteer is asked to choose 7 tasks to be assigned
volunteer_choices = dict(zip(volunteers, [list(sample(tasks, k=7)) for i in range(num_volunteers)]))

# Each volunteer can choose to take between 1 - 3 tasks
volunteer_max_tasks = dict(zip(volunteers, [choice([1, 2, 3]) for i in range(num_volunteers)]))

### DEFINING MODEL

# Define model
model = LpProblem(name = "resource-allocation", sense = LpMaximize)

# Define decision pair
pair = LpVariable.dicts("Pair", (volunteers, tasks), cat=LpBinary)  # no need for upper/lower bound for binary. :)
task_covered = LpVariable.dicts("Covered", tasks, cat=LpBinary)

# Set list of all possible pairs
pairs = [(v, t) for t in tasks for v in volunteers]

# Set objective
model += lpSum(task_covered[t] for t in tasks) + \
         0.05 * lpSum(pair[v][t] for v in volunteers for t in tasks)   # a little "sugar" to maximize assignments

# model += (lpSum([pair[v][t] for (v, t) in pairs]))  # not needed


### SETTING CONSTRAINTS

#  NEW:  Link the coverage of task to having at least one volunteer paired with it.
#        The model wants to "increase" the covered task variable, so we need to clamp it
#        down unless there is at least one pair assigned.  Make sense?
for t in tasks:
    model += task_covered[t] <= lpSum(pair[v][t] for v in volunteers)

# All volunteers must be assigned at least one task  <-- superfluous constraint.  Model is "trying" to do this
# for v in volunteers:
#     model += lpSum(pair[v][t] for t in tasks) >= 1

# Volunteers cannot be assigned to more tasks than they are willing to take on
for v in volunteers:
    model += lpSum(pair[v][t] for t in tasks) <= volunteer_max_tasks[v]

# Volunteers cannot be assigned too high a work load
for v in volunteers:
    model += lpSum(pair[v][t] * task_load[t] for t in tasks) <= 6
    
# Volunteers cannot be assigned a task they didn't choose
for v in volunteers:
    for t in tasks:
        if not (t in volunteer_choices[v]):
            model += pair[v][t] == 0
    
# All tasks must get a volunteer (CAN I LOOSEN THIS?)  # This is where your infeasibility problem was
# for t in tasks:
#     model += (lpSum([pair[v][t] for v in volunteers]) >= 1)

model.solve()

for v in sorted(volunteers):
    for task in sorted(pair[v]):
        if pair[v][task].varValue:
            print(f'pair: {v} to task {task}')

Yields

Result - Optimal solution found

Objective value:                10.85000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.00
Time (Wallclock seconds):       0.00

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.00

pair: V1 to task T4
pair: V1 to task T5
pair: V1 to task T9
pair: V2 to task T8
pair: V3 to task T10
pair: V3 to task T9
pair: V4 to task T2
pair: V4 to task T4
pair: V4 to task T7
pair: V5 to task T1
pair: V5 to task T3
pair: V5 to task T4
pair: V6 to task T7
pair: V6 to task T8
pair: V7 to task T4
pair: V7 to task T6
pair: V7 to task T7
[Finished in 176ms]