Search code examples
kerasdeep-learning

Is it possible to automatically infer the class_weight from flow_from_directory in Keras?


I have an imbalanced multi-class dataset and I want to use the class_weight argument from fit_generator to give weights to the classes according to the number of images of each class. I'm using ImageDataGenerator.flow_from_directory to load the dataset from a directory.

Is it possible to directly infer the class_weight argument from the ImageDataGenerator object?


Solution

  • Just figured out a way of achieving this.

    from collections import Counter
    train_datagen = ImageDataGenerator()
    train_generator = train_datagen.flow_from_directory(...)
    
    counter = Counter(train_generator.classes)                          
    max_val = float(max(counter.values()))       
    class_weights = {class_id : max_val/num_images for class_id, num_images in counter.items()}                     
    
    model.fit_generator(...,
                        class_weight=class_weights)
    

    train_generator.classes is a list of classes for each image. Counter(train_generator.classes) creates a counter of the number of images in each class.

    Note that these weights may not be good for convergence, but you can use it as a base for other type of weighting based on occurrence.

    This answer was inspired by: https://github.com/fchollet/keras/issues/1875#issuecomment-273752868