Search code examples
ralgorithmmachine-learningclassification

How to compute log loss in machine learning


The following code are used to produce the probability output of binary classification with Random Forest.

library(randomForest) 

rf <- randomForest(train, train_label,importance=TRUE,proximity=TRUE)
prediction<-predict(rf, test, type="prob")

Then the result about prediction is as follows:

enter image description here

The true label about test data are known (named test_label). Now I want to compute logarithmic loss for probability output of binary classification. The function about LogLoss is as follows.

LogLoss=function(actual, predicted)
{
  result=-1/length(actual)*(sum((actual*log(predicted)+(1-actual)*log(1-predicted))))
  return(result)
}

How to compute logarithmic loss with probability output of binary classification. Thank you.


Solution

  • library(randomForest) 
    
    rf <- randomForest(Species~., data = iris, importance=TRUE, proximity=TRUE)
    prediction <- predict(rf, iris, type="prob")
    #bound the results, otherwise you might get infinity results
    prediction <- apply(prediction, c(1,2), function(x) min(max(x, 1E-15), 1-1E-15)) 
    
    #model.matrix generates a true probabilities matrix, where an element is either 1 or 0
    #we subtract the prediction, and, if the result is bigger than 0 that's the correct class
    logLoss = function(pred, actual){
      -1*mean(log(pred[model.matrix(~ actual + 0) - pred > 0]))
    }
    
    logLoss(prediction, iris$Species)