Search code examples
rmodelcaret

Importance plot in train() method from caret package problem


I have made a model using the following codes. There is one categorical variable among my variables called "ot_soilTextu."

rf_final <- caret::train(BULK_DENSITY ~ forest_210 + legumes_158 + corn_147 + bio3 + 
                           ot_soilTextu + grass_122 + mrvbf + bio5 + bio15 + bio9 + 
                           grav_1st_1 + bio18 + deme2000 + grav_1st_2,
                         method = "rf",
                         data=cq,
                         tuneGrid = expand.grid(mtry = rf_CV$bestTune$mtry), 
                         trControl = trainControl(method = "none"), 
                         importance = TRUE)

Then, I made the importance plot using the following codes.

imp <- varImp(rf_final)
plot(imp, main="BD: 14V ",xlab = list(font=1, cex = 1.25), 
     scales = list(x = list(font=1,cex=1),y=list(font=1,cex=1)))

The importance plot shows all splits of the categorical variable, which is not good. I need to see the importance of 14 variables only and not all the splits of the categorical variable added. Is there a way to solve this problem without writing extra codes? Here is the importance plot: enter image description here


Solution

  • You can convert your categorical variable into numeric before running caret::train function like

    cq$ot_soilTextu <- as.numeric(cq$ot_soilTextu) 
    

    Then you can run caret::train to only have the importance of 14 variables.