I've been trying to experiment with Region Based: Dice Loss but there have been a lot of variations on the internet to a varying degree that I could not find two identical implementations. The problem is that all of these produce varying results. Below are the implementations that I found. Some uses smoothing
factor which the authors in this paper have called epsilon
, some use it in both numerator and denominator, one implementation used Gamma
etc etc.
Could someone please help me with the correct implementation.
import tensorflow as tf
import tensorflow.keras.backend as K
import numpy as np
def dice_loss1(y_true, y_pred, smooth=1e-6):
y_pred = tf.convert_to_tensor(y_pred)
y_true = tf.cast(y_true, y_pred.dtype)
smooth = tf.cast(smooth, y_pred.dtype)
y_pred = K.flatten(y_pred)
y_true = K.flatten(y_true)
intersection = K.sum(K.dot(y_true, y_pred))
dice_coef = (2*intersection + smooth) / (K.sum(y_true) + K.sum(y_pred) + smooth)
dice_loss = 1-dice_coef
return dice_loss
def dice_loss2(y_true, y_pred, smooth=1e-6): # Only Smooth
y_pred = tf.convert_to_tensor(y_pred)
y_true = tf.cast(y_true, y_pred.dtype)
smooth = tf.cast(smooth, y_pred.dtype)
intersection = K.sum(K.abs(y_true * y_pred), axis=-1)
dice_coef = (2. * intersection + smooth) / (K.sum(K.square(y_true),-1) + K.sum(K.square(y_pred),-1) + smooth)
return 1- dice_coef
def dice_loss3(y_true, y_pred): # No gamma, no smooth
y_pred = tf.convert_to_tensor(y_pred)
y_true = tf.cast(y_true, y_pred.dtype)
y_pred = tf.math.sigmoid(y_pred)
numerator = 2 * tf.reduce_sum(y_true * y_pred)
denominator = tf.reduce_sum(y_true + y_pred)
return 1 - numerator / denominator
def dice_loss4(y_true, y_pred, smooth=1e-6, gama=1): # Gama + Smooth is used
y_pred = tf.convert_to_tensor(y_pred)
y_true = tf.cast(y_true, y_pred.dtype)
smooth = tf.cast(smooth, y_pred.dtype)
gama = tf.cast(gama, y_pred.dtype)
nominator = 2 * tf.reduce_sum(tf.multiply(y_pred, y_true)) + smooth
denominator = tf.reduce_sum(y_pred ** gama) + tf.reduce_sum(y_true ** gama) + smooth
result = 1 - tf.divide(nominator, denominator)
return result
y_true = np.array([[0,0,1,0],
y_pred = np.array([[0,0,0.9,0],
# print(dice_loss1(y_true, y_pred)) # Gives you error in K.dot()
print(dice_loss2(y_true, y_pred))
print(dice_loss3(y_true, y_pred)) # provides array of values
print(dice_loss4(y_true, y_pred))
I utilized a variation of the dice loss for brain tumor segmentation. The implementation for the dice coefficient which I used for such results was:
def dice_coef(y_true, y_pred, smooth=100):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
dice = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
return dice
In order to make it a loss, it needs to be made into a function we want to minimize. This can be accomplished by making it negative:
def dice_coef_loss(y_true, y_pred):
return -dice_coef(y_true, y_pred)
or subtracting it from 1:
def dice_coef_loss(y_true, y_pred):
return 1 - dice_coef(y_true, y_pred)
or applying some other function then negating - for example, taking the negative logarithm (which could smooth the gradients):
def dice_coef_loss(y_true, y_pred):
return -K.log(dice_coef(y_true, y_pred))
The variable smooth
represents your observation in other implementations with various names (smoothing
, epsilon
, etc.). Just for clarity, this smoothing variable exists to handle the case where the ground truth has very few white (or no) white pixels (assuming white pixels belonging to a class or boundary of an object, depending on your implementation).
If smooth
is set too low, when the ground truth has few to 0 white pixels and the predicted image has some non-zero number of white pixels, the model will be penalized more heavily. Setting smooth
higher means if the predicted image has some low amount of white pixels when the ground truth has none, the loss value will be lower. Depending on how aggressive the model needs to be, though, maybe a lower value is good.
Here's an illustrative example:
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
def dice_coef(y_true, y_pred, smooth):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
dice = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
return dice
def dice_coef_loss(y_true, y_pred, smooth):
return 1 - dice_coef(y_true, y_pred, smooth)
if __name__ == '__main__':
smooth = 10e-6
y_pred = np.zeros((1, 128, 128))
# one pixel is set to 1
y_pred[0, 0, 0] = 1
y_pred = tf.convert_to_tensor(y_pred, dtype=tf.float32)
y_true = tf.zeros((1, 128, 128), dtype=tf.float32)
print(dice_coef(y_true, y_pred, smooth=smooth))
print(dice_coef_loss(y_true, y_pred, smooth=smooth))
will print out:
tf.Tensor(9.9999e-06, shape=(), dtype=float32)
tf.Tensor(0.99999, shape=(), dtype=float32)
But if smooth
is set to 100:
tf.Tensor(0.990099, shape=(), dtype=float32)
tf.Tensor(0.009900987, shape=(), dtype=float32)
Showing the loss reduces to 0.009 instead of 0.99.
For completeness, if you have multiple segmentation channels (B X W X H X K
, where B
is the batch size, W
and H
are the dimensions of your image, and K
are the different segmentations channels), the same concepts apply, but it can be implemented as follows:
def dice_coef_multilabel(y_true, y_pred, M, smooth):
dice = 0
for index in range(M):
dice += dice_coef(y_true[:,:,:,index], y_pred[:,:,:,index], smooth)
return dice
And it can be converted to a loss function through negation or subtraction, in the same way as dice_coef
is. smooth
could also be tuned per channel, if you supply a list or some other sequence (e.g; smooth_list
def dice_coef_multilabel(y_true, y_pred, M, smooth_list):
dice = 0
for index in range(M):
dice += dice_coef(y_true[:,:,:,index], y_pred[:,:,:,index], smooth_list[index])
return dice