Search code examples
rxgboostr-caret

How to use sum of specificity and sensitivity metric as a summary metric for train in R caret?


I use caret for xgbtree in R:

fitControl_2 <- trainControl(## 3-fold CV
  method = "repeatedcv",
  number = 3,
  repeats = 2,
  verboseIter = TRUE,
  )

xgboost <- train(interest_factor ~ .,
                       data = train_set_balanced,
                       method = "xgbTree",
                 trControl = fitControl_2,
                 ## Specify which metric to optimize
                 metric = "Kappa")

Is there a way to use Sensitivity+Specificity or Youden Index as a metrix instead of Kappa? I understand that you can use custom functions, but it's not clear how to properly build one in this case.


Solution

  • Here is a summary function which will use the the sum of Sens + Spec as selection metric:

    youdenSumary <- function(data, lev = NULL, model = NULL){
      if (length(lev) > 2) {
        stop(paste("Your outcome has", length(lev), "levels. The joudenSumary() function isn't appropriate."))
      }
      if (!all(levels(data[, "pred"]) == lev)) {
        stop("levels of observed and predicted data do not match")
      }
      Sens <- caret::sensitivity(data[, "pred"], data[, "obs"], lev[1]) 
      Spec <- caret::specificity(data[, "pred"], data[, "obs"], lev[2])
      j <- Sens + Spec
      out <- c(j, Spec, Sens)
      names(out) <- c("j", "Spec", "Sens")
     out
    }
    

    To understand why it is defined as such please read this chapter from the caret book. Some answers that might be helpful here on SO are:

    Custom Performance Function in caret Package using predicted Probability

    Additional metrics in caret - PPV, sensitivity, specificity

    Example:

    library(caret)
    library(mlbench)
    data(Sonar)
    
    fitControl <- trainControl(method = "cv",
                               number = 5,
                               summaryFunction = youdenSumary)
    fit <-  train(Class ~.,
                  data = Sonar,
                  method = "rpart", 
                  metric = "j" ,
                  tuneLength = 5,
                  trControl = fitControl)
    
    fit
    #output
    CART 
    
    208 samples
     60 predictor
      2 classes: 'M', 'R' 
    
    No pre-processing
    Resampling: Cross-Validated (5 fold) 
    Summary of sample sizes: 167, 166, 166, 166, 167 
    Resampling results across tuning parameters:
    
      cp          j         Spec       Sens     
      0.00000000  1.394980  0.6100000  0.7849802
      0.01030928  1.394980  0.6100000  0.7849802
      0.05154639  1.387708  0.6300000  0.7577075
      0.06701031  1.398629  0.6405263  0.7581028
      0.48453608  1.215457  0.3684211  0.8470356
    
    j was used to select the optimal model using the largest value.
    The final value used for the model was cp = 0.06701031.