Search code examples
rrandom-forestr-caret

varImp in Caret doesn't sho all the categories for predictor variable


I made a model in Caret (method = "rf") with 3 predictor variables (1 categorical, 2 numerical) and 1 response variable (2 categories). My problem is with the categorical predictor variable "origen_flujo". Origen_flujo has these categories:

>table(origen_flujo)

    Lagos         Llano Precordillera         Valle 
        8            59            12            34 

I run this model:

 rf_fit <- train(clases ~ ., data = data,
            method = "rf",
            preProcess = c("center", "scale"),
            tuneGrid = grid,
            trControl = ctrl,        
            metric= "ROC" 
  )

And then, I applied varImp in the result model, and this shows:

  > varImp(rf_fit)
                      Overall
 V1                          100.00
 origen_flujoLlano           55.87
 origen_flujoPrecordillera   54.73
 V2                          26.08
 origen_flujoValle            0.00

varImp doesn't show "origen_flujoLagos", why?

Thanks


Solution

  • When you create dummy variables, one level of the factor is left out (the first level).

    EDIT - You could create your own full set of dummy variables prior to passing the data into the model. Also, you might be better off including the data as a factor (by avoiding the formula interface). In that way, you get a single importance value for each predictor.