I am working with the function varImp().
I fit a tree, and then use varImp() to see which variables are most important. I would like to extract the most important variable names from the output of varImp(). But the output appears to be a list and there is no way to get the variable names, only the numerical weights of how important the variables are.
I have tried converting the output to a data frame and also using names() but neither allows me to get the important variable names.
Here's an example:
> # Sample data
> head(Orthodont)
Grouped Data: distance ~ age | Subject
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
5 21.5 8 M02 Male
6 22.5 10 M02 Male
> sample_tree <- rpart(distance ~ ., data = Orthodont)
> varImp(sample_tree)
Overall
age 1.1178243
Sex 0.5457834
Subject 2.8446154
> names(varImp(sample_tree))
[1] "Overall"
> as.data.frame(varImp(sample_tree))
Overall
age 1.1178243
Sex 0.5457834
Subject 2.8446154
> # What I want are the names of the two most important variables.
The names you're looking for are in the rownames() of the object.
imp <- varImp(sample_tree)
rownames(imp)[order(imp$Overall, decreasing=TRUE)]
Output:
[1] "Sex" "age" "Subject"
So the two most important variables, according to these scores, are:
rownames(imp)[order(imp$Overall, decreasing=TRUE)[1:2]]
Which gives:
[1] "Sex" "age"