Search code examples
rr-caretrpart

What is the loss function of `varImp` in `R` package `caret`?


I'm using varImp function from R package caret to get importance of variables. This is my code:

library(caret)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 20,
                       search = "grid",summaryFunction = youdenSumary)

classifier = train(form = Target ~ ., data = training_set, method = 'rpart',
                  parms = list(split = "information"),trControl=trctrl,
                  tuneLength = 10,metric = "j")

importance <- varImp(classifier, scale=FALSE)

This is the resulting variables importance:

rpart variable importance

     Overall
nh   532.218
nRT  488.922
wdSu 482.582
av_t 390.266
nc   317.725
o    303.738
dt   291.488
wdMo 103.200
wdSa  49.690
ne    46.707
wdWe  41.642
nl    26.463
wdTu   9.506
wdTh   2.669

The code runs the recursive partitioning algorithm and keep track of how much each split reduces the loss function. But... what is the loss function in this case? The Rdocumentation says:

The reduction in the loss function (e.g. mean squared error) attributed to each variable at each split is tabulated and the sum is returned. Also, since there may be candidate variables that are important but are not used in a split, the top competing variables are also tabulated at each split. This can be turned off using the maxcompete argument in rpart.control. This method does not currently provide class-specific measures of importance when the response is a factor.

It mentions the mean squared error. Is this the loss function used in this package (I'm not sure about that "e.g." in round brackets)?


Solution

  • Mean squared error is used for regression. You can check the long intro for rpart, since you are doing classification, there are two impurity functions, gini and information entropy:

    You specified :

    parms = list(split = "information")
    

    This means you are splitting your tree based on information entropy. In your case, the reduction refers to the reduction in information entropy. You can check the function used by caret by doing:

    caret:::varImpDependencies("rpart")$varImp
    

    It's basically summing up the improvement in information entropy per split, you can roughly check it in your case by doing:

    classifier$finalModel$splits