Search code examples

What exactly are the losses in Matterport Mask-R-CNN?

I use Mask-R-CNN to train my data with it. When i use TensorBoard to see the result, i have the loss, mrcnn_bbox_loss, mrcnn_class_loss, mrcnn_mask_loss, rpn_bbox_loss, rpn_class_loss and all the same 6 loss for the validation: val_loss, val_mrcnn_bbox_loss etc.

I want to know what is each loss exactly.

Also i want to know if the first 6 losses are the train loss or what are they? If they aren't the train loss, how can i see the train loss?

My guess is:

loss: it's all the 5 losses in summary (but i don't know how TensorBoard summarizes it).

mrcnn_bbox_loss: is the size of the bounding box correct or not?

mrcnn_class_loss: is the class correct? is the pixel correctly assign to the class?

mrcnn_mask_loss: is the shape of the instance correct or not? is the pixel correctly assign to the instance?

rpn_bbox_loss: is the size of the bbox correct?

rpn_class_loss: is the class of the bbox correct?

But i am pretty sure this is not right...

And are some lossed irrelevant if i have only 1 class? For example only the background and 1 other class?

My data have only the background and 1 other class and this is my result on TensorBoard:

Result 1:Result 2:Result 3:Result :4

My prediction is ok, but i don't know why some losses from my validation is going up and down at the end... I thought it has to be first only down and after overfitting only up. The prediction i used is the green line on TensorBoard with the most epochs. I am not sure if my Network is overfitted, therfore i am wondering why some losses in the validation look how they look...

Here is my prediction: Example of my Trainset: This is the Ground Truth of my Testset example: This is the prediction from the Testset example:


  • According to both the code comments and the documentation in the Python Package Index, these losses are defined as:

    • rpn_class_loss = RPN anchor classifier loss
    • rpn_bbox_loss = RPN bounding box loss graph
    • mrcnn_class_loss = loss for the classifier head of Mask R-CNN
    • mrcnn_bbox_loss = loss for Mask R-CNN bounding box refinement
    • mrcnn_mask_loss = mask binary cross-entropy loss for the masks head

    Each of these loss metrics is the sum of all the loss values calculated individually for each of the regions of interest. The general loss metric given in the log is the sum of the other five losses (you can check it by summing them up) as defined by the Mask R-CNN's authors.

    In terms of how these losses are calculated as per the original paper, they can be described as follows (note that the definitions are quite rough for the sake of a more intuitive explanation):

    • The classification loss values are basically dependent on the confidence score of the true class, hence the classification losses reflect how confident the model is when predicting the class labels, or in other words, how close the model is to predicting the correct class. In the case of mrcnn_class_loss, all the object classes are covered, whereas in the case of rpn_class_loss the only classification that is done is labelling the anchor boxes as foreground or background (which is the reason why this loss tends to have lower values, as conceptually there are only 'two classes' than can be predicted).
    • The bounding box loss values reflect the distance between the true box parameters -that is, the (x,y) coordinates of the box location, its width and its height- and the predicted ones. It is by its nature a regression loss, and it penalizes larger absolute differences (in an approximately exponential manner for lower differences, and linearly for larger differences - see Smooth L1 loss function for more insight). Hence, it ultimately shows how good the model is at locating objects within the image, in the case of rpn_bbox_loss; and how good the model is at precisely predicting the area(s) within an image corresponding to the different objects that are present, in the case of mrcnn_bbox_loss.
    • The mask loss, similarly to the classification loss, penalizes wrong per-pixel binary classifications (foreground/background, in respect to the true class label). It is calculated differently for each of the regions of interest: Mask R-CNN encodes a binary mask per class for each of the RoIs, and the mask loss for a specific RoI is calculated based only on the mask corresponding to its true class, which prevents the mask loss from being affected by class predictions.

    As you already said, these loss metrics are indeed training losses, and the ones with the val_ prefix are the validation losses. Fluctuations in the validation loss can occur for several different reasons, and it's hard to guess at first sight based only on your charts. They might be caused by a learning rate that is too high (making the stochastic gradient descent overshoot when trying to find a minimum), or a validation set that is too small (which gives unreliable loss values, as small changes in the output can produce big loss value changes).