python tensorflow conv-neural-network object-detection bounding-box

Implementing Intersection over Union Loss Using Tensorflow

This may be more of a Tensorflow gradient question. I have been attempting to implement Intersection over Union (IoU) as losses and have been running into some problems. To the point, here is the snippet of my code that computes the IoU:

def get_iou(masks, predictions):
    ious = []
    for i in range(batch_size):
        mask = masks[i]
        pred = predictions[i]
        masks_sum = tf.reduce_sum(mask)
        predictions_sum = tf.reduce_mean(pred)
        intersection = tf.reduce_sum(tf.multiply(mask, pred))
        union = masks_sum + predictions_sum - intersection
        iou = intersection / union
        ious.append(iou)
    return ious

iou = get_iou(masks, predictions)
mean_iou_loss = -tf.log(tf.reduce_sum(iou))
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)

It works as predicted. However, the issue that I am having is the losses do not decrease. The model does train, though the results are less than ideal so I am wondering if I am implementing it correctly. Do I have to compute the gradients myself? I can compute the gradients for this IoU loss derived by this paper using tf.gradients(), though I am not sure how to incorporate that with the tf.train.AdamOptimizer(). Reading the documentation, I feel like compute_gradients and apply_gradients are the commands that I need to use, but I can't find any examples on how to use them. My understanding is that the Tensorflow graph should be able to come up with the gradient itself via chain rule. So is a custom gradient even necessary in this problem? If the custom gradient is not necessary then I may just have an ill-posed problem and need to adjust some hyperparameters.

Note: I have tried Tensorflow's implementation of the IoU, tf.metrics.mean_iou(), but it spits out inf every time so I have abandoned that.

Solution

Gradient computation occurs inside optimizer.minimize function, so, no explicit use inside loss function is needed. However, your implementation simply lacks an optimizable, trainable variable.

iou = get_iou(masks, predictions)
mean_iou_loss = tf.Variable(initial_value=-tf.log(tf.reduce_sum(iou)), name='loss', trainable=True)
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)

Numerical stability, differentiability and particular implementation aside, this should be enough to use it as a loss function, which will change with iterations.

Also take a look:

https://arxiv.org/pdf/1902.09630.pdf

Why does one not use IOU for training?