Search code examples
rr-caretgbm

Using caret to optimize for deviance with binary classification


(example borrowed from Fatal error with train() in caret on Windows 7, R 3.0.2, caret 6.0-21)

I have this example:

library("AppliedPredictiveModeling")
library("caret")

data("AlzheimerDisease")
data <- data.frame(predictors, diagnosis)

tuneGrid <- expand.grid(interaction.depth = 1:2, n.trees = 100, shrinkage = 0.1)
trainControl <- trainControl(method = "cv", number = 5, verboseIter = TRUE)

gbmFit <- train(diagnosis ~ ., data = data, method = "gbm", trControl = trainControl, tuneGrid = tuneGrid)

But let's say I want to optimize with regards to deviance (which is what I believe gbm returns by default) instead of accuracy. I know that trainControl offers a summaryFunction argument. How do I write a summaryFunction that will optimize for deviance?


Solution

  • Deviance is just (minus) twice the log-likelihood. For binomial data with a single trial, that is:

    -2 \sum_{i=1}^n y_i log(\pi_i) + (1 - y_i)*log(1-\pi_i)
    

    y_i is a binary indicator for the first class and \pi is the probability of being in the first class.

    Here is a simple example to reproduce the deviance in a GLM (by re-calculating the training set deviance):

    > library(caret)
    > set.seed(1)
    > dat <-twoClassSim(200)
    > fit1 <- glm(Class ~ ., data = dat, family = binomial)
    > ## glm() models the last class level
    > prob_class1 <- 1 - predict(fit1, dat[, -ncol(dat)], type = "response")
    > is_class1 <- ifelse(dat$Class == "Class1", 1, 0)
    > -2*sum(is_class1*log(prob_class1) + ((1-is_class1)*log(1-prob_class1)))
    [1] 112.7706
    > fit1
    
    Call:  glm(formula = Class ~ ., family = binomial, data = dat)
    <snip>  
    Degrees of Freedom: 199 Total (i.e. Null);  184 Residual
    Null Deviance:      275.3 
    Residual Deviance: 112.8    AIC: 144.8
    

    A basic function for train would be:

    dev_summary <- function(data, lev = NULL, model = NULL) {
      is_class1 <- ifelse(data$obs == lev[1], 1, 0)
      prob_class1 <- data[, lev[1]]
    
      c(deviance = -2*sum(is_class1*log(prob_class1) + 
                            ((1-is_class1)*log(1-prob_class1))),
        twoClassSummary(data, lev = lev))
    }
    
    ctrl <- trainControl(summaryFunction = dev_summary,
                         classProbs = TRUE)
    gbm_grid <- expand.grid(interaction.depth = seq(1, 7, by = 2),
                            n.trees = seq(100, 1000, by = 50),
                            shrinkage = c(0.01, 0.1))
    set.seed(1)
    fit2 <- train(Class ~ ., data = dat,
                  method = "gbm",
                  trControl = ctrl,
                  tuneGrid = gbm_grid,
                  metric = "deviance",
                  verbose = FALSE)
    

    Note that you will need to think of something to do if \pi is very near zero or one.

    Max