I'm trying to make categorical cross entropy loss function to better understand intuition behind it. So far my implementation looks like this:
# Observations
y_true = np.array([[0, 1, 0], [0, 0, 1]])
y_pred = np.array([[0.05, 0.95, 0.05], [0.1, 0.8, 0.1]])
# Loss calculations
def categorical_loss():
loss1 = -(0.0 * np.log(0.05) + 1.0 * np.log(0.95) + 0 * np.log(0.05))
loss2 = -(0.0 * np.log(0.1) + 0.0 * np.log(0.8) + 1.0 * np.log(0.1))
loss = (loss1 + loss2) / 2 # divided by 2 because y_true and y_pred have 2 observations and 3 classes
return loss
# Show loss
print(categorical_loss()) # 1.176939193690798
However I do not understand how function should behave to return correct value when:
y_pred
is 0
or 1
because then log
function returns -inf
or 0
and how code implementation should look like in this casey_true
is 0
because multiplication by 0
always returns 0
and value of np.log(0.95)
will be discarded then and how code implementation should look like in this case as wellRegarding y_pred
being 0 or 1, digging into the Keras backend source code for both binary_crossentropy
and categorical_crossentropy
, we get:
def binary_crossentropy(target, output, from_logits=False):
if not from_logits:
output = np.clip(output, 1e-7, 1 - 1e-7)
output = np.log(output / (1 - output))
return (target * -np.log(sigmoid(output)) +
(1 - target) * -np.log(1 - sigmoid(output)))
def categorical_crossentropy(target, output, from_logits=False):
if from_logits:
output = softmax(output)
else:
output /= output.sum(axis=-1, keepdims=True)
output = np.clip(output, 1e-7, 1 - 1e-7)
return np.sum(target * -np.log(output), axis=-1, keepdims=False)
from where you can clearly see that, in both functions, there is a clipping operation of the output
(i.e. predictions), in order to avoid infinities from the logarithms:
output = np.clip(output, 1e-7, 1 - 1e-7)
So, here y_pred
will never be exactly 0 or 1 in the underlying calculations. The handling is similar in other frameworks.
Regarding y_true
being 0, there is not any issue involved - the respective terms are set to 0, as they should be according to the mathematical definition.