My intent is to implement a custom loss function for training a model in Keras with TensorFlow as backend.
W and H represent, respectively, the width and height of the softmax layer’s output, and N is the batch size. The variable p is the probability predicted by the FCN for the correct class.
This loss function is from this paper.
In this implementation, N is 4, W is 200 and H is 400
.
The output shape of the final layer is (None, 400, 200, 2)
. A single label's shape is (400, 200, 2)
where each channel represents a class.
Even though this is not useful in this context, this is what I wanted to implement as a loss function.
def loss_using_np(y_true, y_pred):
'''
Assuming, `y_true` and `y_pred` shape is (400, 200, 2).
This might change to (None, 400, 200, 2) while training in batch?
'''
dx = 0.0000000000000001 # Very small value to avoid -infinity while taking log
y_pred = y_pred + dx
class_one_pred = y_pred[:, :, 0]
class_two_pred = y_pred[:, :, 1]
class_one_mask = y_true[:, :, 0] == 1.0
class_two_mask = y_true[:, :, 1] == 1.0
class_one_correct_prob_sum = np.sum(np.log(class_one_pred[class_one_mask]))
class_two_correct_prob_sum = np.sum(np.log(class_two_pred[class_two_mask]))
N = 4
H = 400
W = 200
return -1 * ((class_one_correct_prob_sum + class_two_correct_prob_sum) / ( N * H * W))
Above implementation gives an expected output; bad that it cannot be used.
y_true = np.random.randint(2, size=(400, 200, 2))
y_pred = np.random.random((400, 200, 2))
loss_using_np(y_true, y_pred)
import tensorflow as tf # not a good practice to not use keras.backend?
def loss_function(y_true, y_pred):
# Not a working solution as it raises
# ResourceExhaustedError: OOM when allocating tensor with shape[311146,3,400,2] BUT WHY?
N = 4 # batch size
W = 200
H = 400
dx = 0.0000000000000001
y_pred = tf.add(y_pred, dx)
class_one_gt = y_true[:,:,:,0]
class_one_mask = tf.where(tf.equal(class_one_gt, 1.0))
# Bad to use `tf.gather`. Issues warning,
#`Converting sparse IndexedSlices to a dense Tensor of unknown shape.`
class_one_prob_sum = keras.backend.sum(keras.backend.log(tf.gather(y_pred[:,:,:,0], class_one_mask)))
class_two_gt = y_true[:,:,:,1]
class_two_mask = tf.where(tf.equal(class_two_gt, 1.0))
class_two_prob_sum = keras.backend.sum(keras.backend.log(tf.gather(y_pred[:,:,1], class_two_mask)))
print("This will be printed only once; won't be printed everytime loss is callculated. How to log?")
return -1 * ((class_one_prob_sum + class_two_prob_sum)/ (N * W * H))
def loss_function(y_true, y_pred):
N = 4
H = 400
W = 200
dx = tf.constant(0.0000000000000001, dtype=tf.float32)
correct_probs = tf.boolean_mask(y_pred, tf.equal(y_true, 1.0))
correct_probs = tf.add(correct_probs, dx)
return (-1 * keras.backend.sum(keras.backend.log(correct_probs))) /(N * H * W)
For this #02 approach I'm getting a warning,
UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Can you tell me how to implement this loss function without any warning? I'm not confident that #02 is the right implementation. I'm looking for an optimized solution. Any help or pointers is much appreciated.
I tried to understand what's happening inside loss_function()
using print
statements but, they are printed once while I compile
the model. Is there any way we can log this?
As mentioned by @dennis-ec, one can use
tf.Print()
for debugging.
I'm using Keras 2.1.4
with TensorFlow 1.4.0-rc1
and Python 3.5.2
.
To me, it seems like the authors are using a vanilla binary cross-entropy loss for multi-label classification. They also name it as such, but their definition is a bit odd compared to how you would implement it in Keras.
Basically, you could use binary_crossentropy
as a loss function and supply the labels as arrays of shape (400, 200, 1)
where a 0 denotes the first class and a 1 denotes the second class. The output of your network would then be of the same shape, with sigmoid
activation functions at each output node. This is how semantic segmentation models are usually implemented in Keras. See this repo for an example:
# final layer, sigmoid activations
conv10 = Conv2D(1, 1, activation = 'sigmoid')(conv9)
model = Model(input = inputs, output = conv10)
# binary_crossentropy loss for multi-label classification
model.compile(optimizer = Adam(lr = 1e-4), loss = 'binary_crossentropy', metrics = ['accuracy'])
This should give exactly the same result as with the implementation defined in the paper (they did probably not use Keras).