Search code examples
pythonkerasneural-networkloss-functionloss

Meaning of Loss function in Keras?


i made a neural network with keras in python and cannot really understand what the loss function means.

So here first some general information: i worked with the poker hand dataset with classes 0-9, which i wrote as vectors with the OneHotEncoding. I used the softmax activation in the last layer, so my output tells me for each of the 10 entries in a vector the probability if the sample belongs to a certain class. For example: my real input it (0,1,0,0,0,0,0,0,0,0), which means class 1 (from 0-9 means from no card to royal flush), and class 1 means one pair (if you know poker). With the neural net, it get at the and Outputs like (0.4, 0.2, 0.1, 0.1, 0.2, 0,0,0,0,0), which means that my sample belongs with 40 percent to class 0, with 20 percent to class 1 and so on!

Allright! i used also the binary cross_entropy as loss, the accuracy-metrics and the RMSprop-Optimizer. When i use mode.evaluate() from keras, i got something like 0.16 for the loss and i do not know how to interpret this. Does this mean, that in average, my predictions deviate 0.16 from the true? so if my prediction for class 0 is 0.5, it also could be 0.66 or 0.34? Or how can i interpret it?

Please send help!


Solution

  • First at all, according to your problem definition you have a multi-class problem. Thus, you should use categorical_crossentropy. Binary cross_entropy is for two-class problems or for multi-label classification.
    But generally the value of the loss function has a relative impact value. First at all, you have to understand what the cross_entropy is meaning. The formula is:
    enter image description here
    where c is the correct classification of observation o and
    y is the binary indicator (0 or 1) if class label c is the correct classification for observation o and p is the predicted probability that o is of class c.
    For binary cross entropy, M is equal to 2. For categorical cross entropy, M>2. Therefore, the cross entropy decreases if the predicted probability converges to the actual label:
    enter image description here

    Now let's take your example, where you have 10 classes and your real input is: (0,1,0,0,0,0,0,0,0,0). If you have a loss of 0.16, it means that
    enter image description here which means that your model has assigned 0.85 to the correct label.
    Therefore, the loss function gives you the log of the correct classification probability. As in keras the loss is computed on whole batches, it is the average of the log of the correct classification probability of the whole data in the specific batch. If you use the evaluate function, then it is the average of the log of the correct classification probability of the whole data you are evaluating.