Search code examples
pythonmachine-learninglogistic-regressionsklearn-pandascross-entropy

Does binary log loss exclude one part of equation based on y?


Assuming the log loss equation to be:

logLoss=−(1/N)*∑_{i=1}^N (yi(log(pi))+(1−yi)log(1−pi))

where N is number of samples, yi...yiN is the actual value of the dependent variable, and pi...piN is the predicted likelihood from logistic regression

How I am looking at it:

if yi = 0 then the first part yi(logpi) = 0

Alternatively, if yi = 1 then the second part (1−yi)log(1−pi) = 0

So now, depending on the value of y one part of the equation is excluded. Am I understanding this correctly?

My ultimate goal is to understand how to interpret the results of log loss.


Solution

  • Yes, you are on the right track. Keeping in mind that p_i=P(y_i=1), basically the idea is that the loss function needs to be defined in such a way that it penalizes the tuples for which the prediction does not match the actual label (e.g., when y_i=1 but p_i is low, taken care of by the yi(logpi) part, OR when y_i=0 but p_i is high, taken care of by the (1-yi)log(1-pi) part) and at the same time it should not penalize the tuples much for which the prediction matches the actual label (e.g., when y_i=1 and p_i is high OR when y_i=0 and p_i is low).

    The loss function for logistic regression (cross entropy) exactly addresses the above desired property of the loss function, as can be seen from the following figure.

    enter image description here