Search code examples
computer-visionconv-neural-networkyolo

Using a unbalanced dataset in YOLO


I was wondering if using unbalanced dataset with YOLO would cause it to train worse in terms of accuracy? Would the classes with less images have less accuracy?

I have 3 classes with 14.4 k images

1 class has 12,000 image examples the other 2 have 1,000 image examples each

would this be an issue?

I am training on YOLOR right now and my MAP is at 0.36 on my custom dataset

I classified with the weights and the classification is good but I need to set the confidence very low as the classes with less images have a very low confidence (0.05 - 0.12) while the class with 12,000 images has confidence (0.45 - 0.90


Solution

  • Dataset disbalance always causes performance decrease. Though, there are a few tricks, which may be helpful in your situation:

    1. The simplest one - class weight. May be computed by sklearn's compute_class_weight method.
    2. Quite modern approach - Focal loss (https://arxiv.org/abs/1708.02002). Roughly, this loss function drives more NN's attention for 'hard-detected' objects (simply by increasing the loss on them), which includes imbalanced classes.

    Your low confidence problem may be one of the underfitting consequences. Thats from personal experience with two-stage detectors (mostly Faster-RCNN)