Search code examples
pythonpandasgenetic-algorithmscheduleproduction

Fitness function in a Genetic Algorithm in Python


I'm trying to code a proper fitness function for a problem that we've choose to solve with a GA. The problem consists in identifying the start dates of production for different items and minimize the occurrences of conflicts in available hours of the machines used in each step of transforming raw material into a final product. The problem is that I'm little lost in the fitness function and I don't know how to proceed.

I've calculated for each machine the total load for each production day in a range, finally, I've calculated the total sum of overload days and I'm using this as my "note" for every possible solution in my GA. Right now is quite simple, is returning the values and I think problably is wrong in the concept of the fitness function.

    def evaluate(self, scenario):
        sum_overload = calculate_load_machine(self.reference_date, scenario).to_numpy().sum()
        self.evaluation_note = sum_overload

For example:

I've a table for machines as index and production days as columns, for each day the production load is calculated and if it surpass 1.00 (100% of load capacity) the machine is considered overloaded.

Machines 20/02/2023 21/02/2023 22/02/2023 23/02/2023
mA 0.86 0.80 0.74 0.90
mB 0.90 0.51 0.86 1.10
mC 0.33 0.25 0.24 0.50
mD 1.20 1.15 0.99 0.95

The overload table is presented as:

Machines 20/02/2023 21/02/2023 22/02/2023 23/02/2023
mA 0.00 0.00 0.00 0.00
mB 0.00 0.00 0.00 1.00
mC 0.00 0.00 0.00 0.00
mD 1.00 1.00 0.00 0.00

The result of calculate_load_machine for the overload table is 3.

I am thinking about changing the fitness function to be based in the total number of items I can do in a day, like, for each machin mN, I've a composition of different products that are contributing to the load capacity of a machine, therefore, I should choose which product I would do in order to not overcharge the machine.

Any review, advice or comment is valid, thank you for your help!


Solution

  • I think that your approach is a good one. However, you may want to to complete penalization for ovrload rewarding solutions that have efficient use of machine time. Assign a higher fitness score to solutions that make use of available machine capacity without overloading, and penalize solutions that leave machine time unused.

    An approach is the following. Here you can adjust the conflict penalty (conflict_penalty = 0.5 ) and the machine overload (machine_overload = df/4-1. Here I assume 4 items per machine per day)

    import pandas as pd
    
    class FitnessEvaluator:
        def __init__(self, reference_date):
            self.reference_date = reference_date
            self.evaluation_note = None
    
        def evaluate(self, scenario):
            total_overload = calculate_load_machine(self.reference_date, scenario).to_numpy().sum()
            items_per_day = calculate_items_per_day(scenario)
            total_items = items_per_day.to_numpy().sum()
            conflict_penalty = calculate_conflict_penalty(scenario)
            fitness = total_items - total_overload - conflict_penalty
            self.evaluation_note = fitness
            return fitness
    
    
    def calculate_load_machine(reference_date, df):
        reference_day = pd.to_datetime(reference_date, format='%d/%m/%Y')
        days = (pd.to_datetime(df.columns, format='%d/%m/%Y') - reference_day).days
        machine_overload = df/4-1
        for day in days:
            if day > 0:
                machine_overload.iloc[:, day] += machine_overload.iloc[:, day-1]
        return machine_overload
    
    def calculate_items_per_day(df):
        return df.sum()
    
    def calculate_conflict_penalty(df):
        machines = df.index
        items = df.columns
        conflict_penalty = 0.5
        for item in items:
            machines_with_item = df.loc[df[item] > 0].index
            if len(machines_with_item) > 1:
                conflict_penalty += len(machines_with_item) - 1
        return conflict_penalty
    
    items_data = {'20/02/2023': [0.86,0.80,0.74,0.90],
                  '21/02/2023': [0.90,0.51,0.86,1.10],
                  '22/02/2023': [0.33,0.25,0.24,0.50],
                  '23/02/2023': [1.20,1.15,0.99,0.95]}
    items_per_day = pd.DataFrame(items_data, index=['m1', 'm2', 'm3', 'm4'])
    
    
    
    overload_data = {'20/02/2023': [0, 0, 0, 1],
                     '21/02/2023': [0, 0, 0, 1],
                     '22/02/2023': [0, 0, 0, 0],
                     '23/02/2023': [0, 1, 0, 0]}
    total_overload = pd.DataFrame(overload_data, index=['mA', 'mB', 'mC', 'mD'])
    
    fitness_evaluator = FitnessEvaluator(reference_date='20/02/2023')
    
    items_per_day_fitness = fitness_evaluator.evaluate(items_per_day)
    print(f'Fitness of items_per_day: {items_per_day_fitness}')
    
    total_overload_fitness = fitness_evaluator.evaluate(total_overload)
    print(f'Fitness of total_overload: {total_overload_fitness}')
    
    

    which retunrs:

    Fitness of items_per_day: 32.220000000000006
    Fitness of total_overload: 40.5