Search code examples
pythonconstraintsvariable-assignmentlinear-programmingpulp

Python Pulp LP leaving rooms unassigned and penalties/constraints not preventing that


I am having difficulty getting all classrooms to be assigned to a class for a room assignment problem. I am using a grid calculated based on class capacity & room capacity (values 0-1). Specifically it is leaving a lot of classes unassigned even when there is a room that can accommodate them (especially the rooms that hold 110 since the highest class capacity is 90). I tried penalizing unassigned classes but am getting the same maximization score every time so something is very broken. Any help is appreciated!

Sample Classes needing Assignment

SINF ID Term Subject Code Catalog Number Course Section # Section Type Title/Topic Meeting Pattern Meetings Instructor Room Inst. Method Credit Hrs Grade Mode Maximum Enrollment Rm Cap Request Meets
12345 Fall 2024 ALI 777 ALI 777 1 Lecture M 6pm-9pm M 6pm-9pm Pending General Assignment Room In Person 3 Graded (A-E, I) 900 900
16264 Fall 2024 MGT 3030 MGT 3030 5 Lecture TTh 10:45am-12:05pm TTh 10:45am-12:05pm 06018334 General Assignment Room In Person 3 Graded (A-E, I) 80 80
NEW Fall 2024 FIN 2020 FIN 2020 5 Lecture MW 9:10am-10:30am MW 9:10am-10:30am Pending General Assignment Room In Person 3 Graded (A-E, I) 90 90
16576 Fall 2024 ACC 2100 ACC 2100 1 Lecture TTh 10:45am-12:05pm TTh 10:45am-12:05pm Pending General Assignment Room In Person 3 Graded (A-E, I) 250 250
NEW Fall 2024 FIN 2120 FIN 2120 1 Lecture M 6pm-9pm M 6pm-9pm 06063522 General Assignment Room In Person 3 Graded (A-E, I) 40 40
7464 Fall 2024 MKT 4840 MKT 4840 1 Lecture TTh 9:10am-10:30am TTh 9:10am-10:30am 00253981 General Assignment Room In Person 3 Graded (A-E, I) 60 60
9888 Fall 2024 ECON 2011 ECON 2011 1 Lecture MW 9:10am-10:30am MW 9:10am-10:30am 06031544 General Assignment Room In Person 3 Graded (A-E, I) 40 40
16582 Fall 2024 ACC 2100 ACC 2100 5 Lecture TTh 9:10am-10:30am TTh 9:10am-10:30am Pending General Assignment Room In Person 3 Graded (A-E, I) 80 80
NEW Fall 2024 OS 3440 OS 3440 13 Lecture W 6pm-9pm W 6pm-9pm Pending General Assignment Room In Person 3 Graded (A-E, I) 90 90
11975 Fall 2024 MGT 3810 MGT 3810 7 Lecture W 9:10am-10:30am W 9:10am-10:30am 00332929 General Assignment Room Hybrid 3 Graded (A-E, I) 32 32

Rooms & Capacities

BLD C 105   62
BLD C 106   56
BLD C 107   56
BLD C 108   42
BLD C 203   56
BLD C 204   22
BLD C 206   32
BLD C 207   32
BLD C 208   42
BLD C 210   48
BLD C 211   30
BLD C 212   56
BLD C 301   40
BLD C 302   30
BLD C 303   25
BLD C 304   40
BLD C 305   56
BLD C 435   8
BLD R 105   40
BLD R 110   36
BLD R 115   102
BLD R 125   10
BLD R 205   80
BLD R 210   80
BLD R 212   6
BLD R 214   6
BLD R 215   102
BLD R 216   6
BLD R 218   6
BLD R 220   6
BLD R 222   6
BLD R 224   6
BLD R 226   6
BLD R 228   6
BLD R 230   6
BLD R 232   6
BLD R 250   12
BLD B 110   110
BLD B 1100A 80
BLD B 1110  268
BLD B 1170  80
BLD B 1180  80
BLD B 130   80
BLD B 160   110
BLD B 161   8
BLD B 163   8
BLD B 165   8
BLD B 167   8
BLD B 169   8
BLD B 170   110
BLD B 171   8
BLD B 173   8
BLD B 175   8
BLD B 177   8
BLD B 180   110
BLD B 181   14
BLD B 183   8
BLD B 2100A 20
BLD B 3106  62
BLD B 3112  1
BLD B 3130  30
BLD B 3153  50
BLD B 3157  10
BLD B 3160  62
BLD B 3170  90
BLD B 3180  80
BLD B 5100D 50
BLD B 5122  4
BLD B 5123  6
BLD B 5125  6
BLD B 5127  6
BLD B 5130  83
BLD B 5134  4
BLD B 5140  50
BLD B 5155  8
BLD B 5160  80
BLD B 5160A 40
BLD B 5160B 40
BLD B 5161  8
BLD B 5163  8
BLD B 5165  8
BLD B 5167  8
BLD B 5169  8
BLD B 5170  80
BLD B 5175  15
BLD B 5180  80
BLD B 7112  20
BLD B 7122  12
BLD B 7170  60
BLD B 7175  12
BLD B 7177  10
BLD B 7180  60
BLD B 8159  7
BLD B BALLROOM  120
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
#/content/drive/My Drive/Work Projects/Projects/Scheduling/Ali Chaos/CLSS_FA25_sample.xlsx'
# Define the paths to the files
this_year_input_file_path = '/content/drive/My Drive/WorkProjects/StackOverflowHelp/ClassSchedule_sample.xlsx'
capacity_file_path = '/content/drive/My Drive/WorkProjects/StackOverflowHelp/RoomCapacities.xlsx'

# Load the input Excel files into pandas DataFrames
print("Loading data from Excel files...")
this_year_df = pd.read_excel(this_year_input_file_path)
capacity_df = pd.read_excel(capacity_file_path)

# Process data
print("Processing this year's data...")
this_year_df = this_year_df[~this_year_df['Room'].isin(["No Meeting Pattern", "ONLINE", "CANVAS"])]
this_year_df['Class'] = this_year_df['Course'] + ' - ' + this_year_df['Section #'].astype(str).str.zfill(3)

# Split 'Meeting Pattern' into 'Days' and 'Times'
def split_meeting_pattern(meeting_pattern):
    if pd.isna(meeting_pattern):
        return pd.Series([None, None])
    try:
        parts = meeting_pattern.split(' ', 1)
        if len(parts) == 2:
            days, time_range = parts
            start_time, end_time = time_range.split('-')

            # Fix start time if minutes are missing
            if start_time[-2:] in ['am', 'pm'] and ':' not in start_time:
                start_time = start_time[:-2] + ':00' + start_time[-2:]

            # Fix end time if minutes are missing
            if end_time[-2:] in ['am', 'pm'] and ':' not in end_time:
                end_time = end_time[:-2] + ':00' + end_time[-2:]

            time_range = start_time + '-' + end_time  # Recombine
            return pd.Series([days, time_range])
        else:
            days, time_range = parts[0], None
            return pd.Series([days, time_range])
    except Exception as e:
        print(f"Error splitting meeting pattern '{meeting_pattern}': {e}")
        return pd.Series([None, None])

print("Splitting meeting patterns...")
this_year_df[['Days', 'Times']] = this_year_df['Meeting Pattern'].apply(split_meeting_pattern)

this_year_df_filtered = this_year_df[['Class', 'Days', 'Times', 'Meetings', 'Instructor', 'Room', 'Rm Cap Request']]

# Save DataFrames to Excel files
print("Saving DataFrames to Excel files...")
output_folder = '/content/drive/My Drive/WorkProjects/StackOverflowHelp/Output/'
this_year_output_file = output_folder + 'this_year_df_filtered.xlsx'
capacity_output_file = output_folder + 'capacity_df.xlsx'

# Create the output folder if it does not exist
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

this_year_df_filtered.to_excel(this_year_output_file, index=False)
capacity_df.to_excel(capacity_output_file, index=False)

print("Files saved:")
print(f"This year DataFrame: {this_year_output_file}")
print(f"Capacity DataFrame: {capacity_output_file}")

# Check DataFrame shapes and content
print("Checking DataFrame shapes and content...")
print(f"this_year_df shape: {this_year_df.shape}")
print(f"this_year_df_filtered shape: {this_year_df_filtered.shape}")
print(f"capacity_df shape: {capacity_df.shape}")

print(this_year_df_filtered.head())
print(capacity_df.head())

CREATING SCORING GRID

#Score Grid based on only Capacity Match
import pandas as pd
import re

# Initialize the grid with zeros and default scores
print("Initializing the grid...")
classes = this_year_df_filtered['Class'].tolist()
rooms = capacity_df['Room'].astype(str)
grid = pd.DataFrame(0, index=classes, columns=rooms)

print(grid)
# Room capacities
room_capacity = dict(zip(capacity_df['Room'].astype(str), capacity_df['Capacity']))

for c in classes:
    if not this_year_df_filtered[this_year_df_filtered['Class'] == c].empty:
        max_enrollment = this_year_df_filtered[this_year_df_filtered['Class'] == c].iloc[0]['Rm Cap Request'] #max class enrollment
        for r in rooms:
            if room_capacity[r] >= max_enrollment:  # Room can accommodate the class
                utilization = max_enrollment / room_capacity[r]
                grid.at[c, r] = utilization
            else:  # Room cannot accommodate the class
                grid.at[c, r] = 0

# Save the match score grid to an Excel file
grid_output_file = output_folder + 'match_score_grid.xlsx'
grid.to_excel(grid_output_file)

print(f"Match Score Grid has been saved to {grid_output_file}")

#TIME CONVERT FUNCTION

def time_to_minutes(time_str):
    if time_str is None:
        return None
    time_str = time_str.strip().lower()
    if time_str[-2:] not in ['am', 'pm']:
        time_str += 'am'
    period = time_str[-2:]
    # Handle potential multiple colons by splitting only on the first one
    time_parts = time_str[:-2].split(':', 1)  # Split only once
    if len(time_parts) == 2:
        # If minutes are missing, consider it 0
        try:
            hours, minutes = map(int, time_parts)
        except ValueError:
            hours = int(time_parts[0])
            minutes = 0  # Consider minutes to be 0
    else:
        hours = int(time_parts[0])
        minutes = 0

    if period == 'pm' and hours != 12:
        hours += 12
    if period == 'am' and hours == 12:
        hours = 0
    return hours * 60 + minutes

# Function to check for time overlap
def time_overlap(time1, time2):
    if time1 is None or time2 is None:
        return False

    days1, hours1 = time1
    days2, hours2 = time2

    if not set(days1).isdisjoint(days2):
        start1, end1 = map(time_to_minutes, hours1.split('-'))
        start2, end2 = map(time_to_minutes, hours2.split('-'))

        return not (end1 <= start2 or end2 <= start1)

    return False
from pulp import LpMaximize, LpProblem, LpVariable, lpSum, value

## Define the optimization problem
def solve_optimization(include_back_to_back=True):
    print("Starting optimization...")
    prob = LpProblem("Classroom_Assignment", LpMaximize)

    # Define decision variables
    x = LpVariable.dicts("classroom_assignment", [(c, r) for c in classes for r in rooms], cat='Binary')

# Constraints
    print("Reviewing Constraints...")

    # Must assign all classes 
     ###HELP!  Nothing I am doing works....           
    
    # Each class is assigned to exactly one room
    for c in classes:
        prob += lpSum(x[(c, r)] for r in rooms) == 1, f"Class_Assigned_{c}" #label
    
    # Maximum enrollment must be equal to or below room capacity
    for c in classes:
        if not this_year_df_filtered[this_year_df_filtered['Class'] == c].empty:
            max_enrollment = this_year_df_filtered[this_year_df_filtered['Class'] == c].iloc[0]['Rm Cap Request']
            for r in rooms:
                prob += (x[(c, r)] * max_enrollment <= room_capacity[r], #ensure r capacity < enrollment
                         f"Room_Capacity_Constraint_Class_{c}_Room_{r}") #Label

    # No overlapping classes in the same room
    class_times = [] # Used to identify scheduling overlaps
    for c in classes:
        df_c = this_year_df_filtered[this_year_df_filtered['Class'] == c][['Days', 'Times']]
        if not df_c.empty:
            class_times.append((c, df_c.values[0]))
    class_times.sort(key=lambda x: time_to_minutes(x[1][1].split('-')[0]))  # Sort by start time

    for r in rooms:
        for i in range(len(class_times)):
            c1, time1 = class_times[i]
            for j in range(i + 1, len(class_times)):
                c2, time2 = class_times[j]
                if time_overlap(time1, time2):
                    prob += x[(c1, r)] + x[(c2, r)] <= 1, f"Overlap_{c1}_{c2}_{r}"

# Penalties
    print("Assessing Penalties...")
    
    # Unassigned classes
    unassigned_class_p = lpSum((1 - lpSum(x[(c, r)] for r in rooms)) * 500 for c in classes)
        #counts unassigned classes and penalizes by 5 points for each

    # Underused Classroom Capacity
    unused_capacity_p = 0
    for r in rooms: 
      for c in classes: 
        if value(x[(c, r)]) == 1:  #if assigned a room
          if grid.at[c, r] == 1:  #fully utilized
              continue  # Skip penalty
          if grid.at[c,r] <= .5:  #and if the utilization < 50%
              unused_capacity_p += (1-grid.at[c,r]) * room_capacity[r]  # underused penalty
          elif grid.at[c,r] < .85:  #if over 50% but <85%
              unused_capacity_p += (1 - grid.at[c,r]) * room_capacity[r] *.5 # Less penalty

# Objective function: Maximize the total match score
    objective = lpSum(grid.loc[c, r] * x[(c, r)] for c in classes for r in rooms) - unused_capacity_p - unassigned_class_p
    
    prob += objective #easier for editing later if we need to change the function
 
# Solve the problem
    print("Solving the problem...")
    prob.solve()

    # Extract results
    assignment = {}
    for c in classes:
        for r in rooms:
            if value(x[(c, r)]) == 1:  #check if room is assigned
                assignment[c] = r
                assigned = True
                break #stop from looping once a class is assigned
            else: 
              if (x[(c, r)]) == 0:
                assignment[c] = "Help!" # If no room was assigned, mark as "Error"

    total_match_score = value(prob.objective)

    return assignment, total_match_score, unassigned_class_p

    assignment, total_match_score, unassigned_class_p = solve_optimization()

    print("Optimal Assignment:")
    for c, r in assignment.items():
        print(f"{c} -> {r}")

    print(f"Total Match Score: {total_match_score}")
    print(f"Unassigned Class Penalty: {value(unassigned_class_p)}")

Solution

  • A couple of things before we get to solution...

    • You should strive to make a minimally reproducible example like mine below. Asking folks to parse out data and including a bunch of "secondary" code that doesn't get right at the problem typically leads to non-answered questions on this site
    • Start with SMALL data that you can hand-jam and see if it is working. Then, when confident, turn it loose on large data
    • You can certainly work with pandas if comfortable with that, I find that commingling pandas into the optimization just gets tough to read for some, and tough to troubleshoot
    • do ALL of your pre-processing before and outside of the optimization so you can check it (unit test/print out/etc.)

    Onward... One of the key flaws in your optimization is that you are asking for the value of assign inside of the optimization in the grid piece. That is illegal. The value isn't known when the problem is built. You need to re-factor it somehow (shown)

    You should always check the status of the solve. It should be "on the screen" and it can be checked (see below). If you get anything other than "optimal" the values loaded are unusable

    CODE:

    from pulp import LpMaximize, LpProblem, LpVariable, lpSum, value, LpBinary
    
    ### DATA
    # Classes and sizes
    classes = {
        'MATH 1': 220,
        'MATH 2': 100,
        'MATH 3': 100,
        'SCI 1': 150,
        'SCI 2': 300,
        'SCI 3': 30,
        'BASKET WEAVING': 5
    }
    
    # overlapping classes  ...  Compute this somewhere
    overlapping_classes = (
        ('MATH 1', 'MATH 2'),
        ('MATH 3', 'SCI 1'),
    )
    
    # Room Sizes
    rooms = {
        'Bldg 1: 45': 200,
        'Bldg 1: 35': 200,
        'Bldg 2: 6a': 20,
        'Bldg 2: 6b': 40,
        'Bldg 3: 401': 600,
    }
    
    # Fill Rate
    fill_rate = {(c, r): classes[c]/rooms[r] for c in classes for r in rooms}
    # print(fill_rate)
    
    # legal assignments.
    # this is not "required", but by doing so, you greatly reduce the scope of the problem
    # the alternative is to add an additional constraint that the assignment <= capacity for
    # each assignment
    legal_assignments = [(c, r) for (c, r) in fill_rate if fill_rate[c, r] <= 1.0]
    print(f'Found {len(legal_assignments)} legal assignments:  {legal_assignments}')
    
    ### THE OPTIMIZATION
    
    prob = LpProblem('class assignment', LpMaximize)
    
    # VARS
    
    # the key variable is BINARY (yes/no decision to assign class to room
    # by using the "legal assignments" we control the domain well...
    assign = LpVariable.dicts('assign', indices=legal_assignments, cat=LpBinary)
    
    # CONSTRAINTS
    
    # can only schedule each class once
    for c in classes:
        prob += lpSum(assign[c, r] for r in rooms if (c,r) in legal_assignments) <= 1
    
    # can't overlap schedule
    for c1, c2 in overlapping_classes:
        for r in rooms:
            if (c1, r) in legal_assignments and (c2, r) in legal_assignments:
                prob += assign[c1, r] + assign[c2, r] <= 1
    
    # OBJ:  maximize scheduling of classes -> room with bonus for high utilization
    # we need to weight the sum of utilization credits such that it can never be better than
    # the decision to schedule a single class.... so:
    ute_weight = 1/len(classes)
    
    #       Total Asmts     fill-rate bonus, weighted
    prob += lpSum(assign) + ute_weight * lpSum(assign[c, r]*fill_rate[c, r] for (c,r) in legal_assignments)
    
    ### CHECK IT
    
    ### SOLVE IT
    prob.solve()
    
    ### PROCESS RESULTS
    if value(prob.status) == 1:  # optimal solve
        print(f'Objective value: {value(prob.objective)}')
        print(f'Solution status: {value(prob.status)}')
        for c, r in legal_assignments:
            if value(assign[c, r]) > .001: # non-zero
                print(f'Assign class {c} to room {r}')
    else:
        print('failed, see solver log')
    

    Output (partial):

    Result - Optimal solution found
    
    Objective value:                7.51666667
    Enumerated nodes:               0
    Total iterations:               0
    Time (CPU seconds):             0.00
    Time (Wallclock seconds):       0.00
    
    Option for printingOptions changed from normal to all
    Total time (CPU seconds):       0.00   (Wallclock seconds):       0.00
    
    Objective value: 7.516666666666667
    Solution status: 1
    Assign class MATH 1 to room Bldg 3: 401
    Assign class MATH 2 to room Bldg 1: 35
    Assign class MATH 3 to room Bldg 1: 35
    Assign class SCI 1 to room Bldg 1: 45
    Assign class SCI 2 to room Bldg 3: 401
    Assign class SCI 3 to room Bldg 2: 6b
    Assign class BASKET WEAVING to room Bldg 2: 6a