Search code examples
tensorflowkerasneural-networkclassificationefficientnet

Does image classification transfer learning require negative examples?


Task is to determine which of 3 classes does an image belongs to, or none.

I received a ready model. EfficientNet B4 with ImageNet weights had transfer learning applied to identify 4 classes: 3 target ones and a 4th "None". Latter was trained on examples of random images not containing any of target objects.

Question is if it the correct approach – is the 4th class needed?

My intuition is that net should be trained on the 3 target classes only. Should the output probabilities stay below some threshold (90%?), image should be considered as NOT containing any of the target objects. Am I right?


Solution

  • Due to the nature of the softmax function and the manner in which the network is trained, you need the 4th class.

    Let's see a concrete example: You train your network to distinguish between apples, oranges and bananas. However, you somehow get the photo of a plum.

    You might be surprised at first sight but you need the other class in your dataset. There is no guarantee that using a thresholding will help you eliminate the other class.

    You may expect the following two cases:

    1. The output probability is guaranteed to be 1/N for an unknown class, given that you are testing on an unknown N+1 class.
    2. A certain threshold beyond which (like you assumed) < 90% it is not class.

    Assume the next cases:

    1. What if you have a case in which an apple really looks like an orange, and your model correctly predicts 40% apple, 30% orange, 30% banana, but since you applied your threshold a correctly identified apple (True Positive) is eliminated? A simple case in which you eliminate the good output of your network
    2. You can still have a 91% assignation to a class, although the new 'fruit' arrival is not part of your dataset; this is due to the inherent calculations and the manner in which softmax works.

    Personal Experience: I have once trained a network to distinguish between many types of traffic signs. Out of pure curiosity, I gave it an example of one living room chairs. I expected the same thing like you(the thresholding), but much to my surprise, it was 85% "Yield Way".