Assuming the log loss equation to be:
logLoss=−(1/N)*∑_{i=1}^N (yi(log(pi))+(1−yi)log(1−pi))
where N
is number of samples, yi...yiN
is the actual value of the dependent variable, and pi...piN
is the predicted likelihood from logistic regression
How I am looking at it:
if yi = 0
then the first part yi(logpi) = 0
Alternatively, if yi = 1
then the second part (1−yi)log(1−pi) = 0
So now, depending on the value of y
one part of the equation is excluded. Am I understanding this correctly?
My ultimate goal is to understand how to interpret the results of log loss.
Yes, you are on the right track. Keeping in mind that p_i=P(y_i=1)
, basically the idea is that the loss function needs to be defined in such a way that it penalizes the tuples for which the prediction does not match the actual label (e.g., when y_i=1
but p_i
is low, taken care of by the yi(logpi)
part, OR when y_i=0
but p_i
is high, taken care of by the (1-yi)log(1-pi)
part) and at the same time it should not penalize the tuples much for which the prediction matches the actual label (e.g., when y_i=1
and p_i
is high OR when y_i=0
and p_i
is low).
The loss function for logistic regression (cross entropy
) exactly addresses the above desired property of the loss function, as can be seen from the following figure.