python pandas genetic-algorithm schedule production

Fitness function in a Genetic Algorithm in Python

I'm trying to code a proper fitness function for a problem that we've choose to solve with a GA. The problem consists in identifying the start dates of production for different items and minimize the occurrences of conflicts in available hours of the machines used in each step of transforming raw material into a final product. The problem is that I'm little lost in the fitness function and I don't know how to proceed.

I've calculated for each machine the total load for each production day in a range, finally, I've calculated the total sum of overload days and I'm using this as my "note" for every possible solution in my GA. Right now is quite simple, is returning the values and I think problably is wrong in the concept of the fitness function.

    def evaluate(self, scenario):
        sum_overload = calculate_load_machine(self.reference_date, scenario).to_numpy().sum()
        self.evaluation_note = sum_overload

For example:

I've a table for machines as index and production days as columns, for each day the production load is calculated and if it surpass 1.00 (100% of load capacity) the machine is considered overloaded.

Machines	20/02/2023	21/02/2023	22/02/2023	23/02/2023
mA	0.86	0.80	0.74	0.90
mB	0.90	0.51	0.86	1.10
mC	0.33	0.25	0.24	0.50
mD	1.20	1.15	0.99	0.95

The overload table is presented as:

Machines	20/02/2023	21/02/2023	23/02/2023
mA	0.00	0.00	0.00
mB	0.00	0.00	1.00
mC	0.00	0.00	0.00
mD	1.00	1.00	0.00

The result of calculate_load_machine for the overload table is 3.

I am thinking about changing the fitness function to be based in the total number of items I can do in a day, like, for each machin mN, I've a composition of different products that are contributing to the load capacity of a machine, therefore, I should choose which product I would do in order to not overcharge the machine.

Any review, advice or comment is valid, thank you for your help!

Solution

I think that your approach is a good one. However, you may want to to complete penalization for ovrload rewarding solutions that have efficient use of machine time. Assign a higher fitness score to solutions that make use of available machine capacity without overloading, and penalize solutions that leave machine time unused.

An approach is the following. Here you can adjust the conflict penalty (conflict_penalty = 0.5 ) and the machine overload (machine_overload = df/4-1. Here I assume 4 items per machine per day)

import pandas as pd

class FitnessEvaluator:
    def __init__(self, reference_date):
        self.reference_date = reference_date
        self.evaluation_note = None

    def evaluate(self, scenario):
        total_overload = calculate_load_machine(self.reference_date, scenario).to_numpy().sum()
        items_per_day = calculate_items_per_day(scenario)
        total_items = items_per_day.to_numpy().sum()
        conflict_penalty = calculate_conflict_penalty(scenario)
        fitness = total_items - total_overload - conflict_penalty
        self.evaluation_note = fitness
        return fitness


def calculate_load_machine(reference_date, df):
    reference_day = pd.to_datetime(reference_date, format='%d/%m/%Y')
    days = (pd.to_datetime(df.columns, format='%d/%m/%Y') - reference_day).days
    machine_overload = df/4-1
    for day in days:
        if day > 0:
            machine_overload.iloc[:, day] += machine_overload.iloc[:, day-1]
    return machine_overload

def calculate_items_per_day(df):
    return df.sum()

def calculate_conflict_penalty(df):
    machines = df.index
    items = df.columns
    conflict_penalty = 0.5
    for item in items:
        machines_with_item = df.loc[df[item] > 0].index
        if len(machines_with_item) > 1:
            conflict_penalty += len(machines_with_item) - 1
    return conflict_penalty

items_data = {'20/02/2023': [0.86,0.80,0.74,0.90],
              '21/02/2023': [0.90,0.51,0.86,1.10],
              '22/02/2023': [0.33,0.25,0.24,0.50],
              '23/02/2023': [1.20,1.15,0.99,0.95]}
items_per_day = pd.DataFrame(items_data, index=['m1', 'm2', 'm3', 'm4'])



overload_data = {'20/02/2023': [0, 0, 0, 1],
                 '21/02/2023': [0, 0, 0, 1],
                 '22/02/2023': [0, 0, 0, 0],
                 '23/02/2023': [0, 1, 0, 0]}
total_overload = pd.DataFrame(overload_data, index=['mA', 'mB', 'mC', 'mD'])

fitness_evaluator = FitnessEvaluator(reference_date='20/02/2023')

items_per_day_fitness = fitness_evaluator.evaluate(items_per_day)
print(f'Fitness of items_per_day: {items_per_day_fitness}')

total_overload_fitness = fitness_evaluator.evaluate(total_overload)
print(f'Fitness of total_overload: {total_overload_fitness}')

which retunrs:

Fitness of items_per_day: 32.220000000000006
Fitness of total_overload: 40.5