I have used resnet50 to solve a multi-class classification problem. The model outputs probabilities for each class. Which loss function should I choose for my model?
After choosing binary cross entropy :
After choosing categorical cross entropy:
The above results are for the same model with just different loss functions.This model is supposed to classify images into 26 classes so categorical cross entropy should work. Also, in the first case accuracy is about 96% but losses are so high. Why?
You definitely need to use categorical_crossentropy
for a multi-classification problem. binary_crossentropy
will reduce your problem down to a binary classification problem in a way that's unclear without further looking into it.
I would say that the reason you are seeing high accuracy in the first (and to some extent the second) case is because you are overfitting. The first dense layer you are adding contains 8 million parameters (!!! to see that do model.summary()
), and you only have 70k images to train it with 8 epochs. This architectural choice is very demanding both in computing power and in data requirement. You are also using a very basic optimizer (SGD
). Try to use a more powerful Adam
.
Finally, I am a bit surprised at your choice to take a 'sigmoid'
activation function in the output layer. Why not a more classic 'softmax'
?