Search code examples
rclassificationr-caretgbmmultinomial

How to interpret/tune a multinomial classification with caret-GBM?


Two questions

  1. Visualizing the error of a model
  2. Calculating the log loss

(1) I'm trying to tune a multinomial GBM classifier, but I'm not sure how to adapt to the outputs. I understand that LogLoss is meant to be minimized, but in the below plot, for any range of iterations or trees, it only appears to increase.

inTraining <- createDataPartition(final_data$label, p = 0.80, list = FALSE)
training <- final_data[inTraining,]
testing <- final_data[-inTraining,]

fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3, verboseIter = FALSE, savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)


gbmGrid1 <- expand.grid(.interaction.depth = (1:5)*2, .n.trees = (1:10)*25, .shrinkage = 0.1, .n.minobsinnode = 10)

gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
                   verbose = 1, metric = "ROC", tuneGrid = gbmGrid1)

plot(gbmFit1)

-- (2) on a related note, when I try to directly investigate mnLogLoss I get this error, which keeps me from trying to quantify the error.

mnLogLoss(testing, levels(testing$label)) : 'lev' cannot be NULL

Solution

  • I suspect you set the learning rate too high. So using an example dataset:

    final_data = iris
    final_data$label=final_data$Species
    final_data$Species=NULL
    inTraining <- createDataPartition(final_data$label, p = 0.80, list = FALSE)
    training <- final_data[inTraining,]
    testing <- final_data[-inTraining,]
    
    fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3, 
    verboseIter = FALSE, savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
    
    gbmGrid1 <- expand.grid(.interaction.depth = 1:3, .n.trees = (1:10)*10, .shrinkage = 0.1, .n.minobsinnode = 10)
    
    gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
                       verbose = 1, tuneGrid = gbmGrid1,metric="logLoss")
    
    plot(gbmFit1)
    

    enter image description here

    A bit different from yours but you can see the upward trend after 20. It really depends on your data but if you have a high learning rate, you arrive very quickly at a minimum and anything after that introduces noise. You can see this illustration from Boehmke's book and also check out a more statistics based discussion.

    enter image description here

    Let's lower the learning rate and you can see:

    gbmGrid1 <- expand.grid(.interaction.depth = 1:3, .n.trees = (1:10)*10, .shrinkage = 0.01, .n.minobsinnode = 10)
    
    gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
                       verbose = 1, tuneGrid = gbmGrid1,metric="logLoss")
    
    plot(gbmFit1)
    

    enter image description here

    Note that you most likely need more iterations to reach a lower loss, like what you see with the first.