I made a model in Caret (method = "rf") with 3 predictor variables (1 categorical, 2 numerical) and 1 response variable (2 categories). My problem is with the categorical predictor variable "origen_flujo". Origen_flujo has these categories:
>table(origen_flujo)
Lagos Llano Precordillera Valle
8 59 12 34
I run this model:
rf_fit <- train(clases ~ ., data = data,
method = "rf",
preProcess = c("center", "scale"),
tuneGrid = grid,
trControl = ctrl,
metric= "ROC"
)
And then, I applied varImp in the result model, and this shows:
> varImp(rf_fit)
Overall
V1 100.00
origen_flujoLlano 55.87
origen_flujoPrecordillera 54.73
V2 26.08
origen_flujoValle 0.00
varImp doesn't show "origen_flujoLagos", why?
Thanks
When you create dummy variables, one level of the factor is left out (the first level).
EDIT - You could create your own full set of dummy variables prior to passing the data into the model. Also, you might be better off including the data as a factor (by avoiding the formula interface). In that way, you get a single importance value for each predictor.