Search code examples
rrpart

Get the most important variable names from varImp()


I am working with the function varImp().

I fit a tree, and then use varImp() to see which variables are most important. I would like to extract the most important variable names from the output of varImp(). But the output appears to be a list and there is no way to get the variable names, only the numerical weights of how important the variables are.

I have tried converting the output to a data frame and also using names() but neither allows me to get the important variable names.

Here's an example:

> # Sample data
> head(Orthodont)
Grouped Data: distance ~ age | Subject
  distance age Subject  Sex
1     26.0   8     M01 Male
2     25.0  10     M01 Male
3     29.0  12     M01 Male
4     31.0  14     M01 Male
5     21.5   8     M02 Male
6     22.5  10     M02 Male
> sample_tree <- rpart(distance ~ ., data = Orthodont)
> varImp(sample_tree)
          Overall
age     1.1178243
Sex     0.5457834
Subject 2.8446154
> names(varImp(sample_tree))
[1] "Overall"
> as.data.frame(varImp(sample_tree))
          Overall
age     1.1178243
Sex     0.5457834
Subject 2.8446154
> # What I want are the names of the two most important variables.

Solution

  • The names you're looking for are in the rownames() of the object.

    imp <- varImp(sample_tree)
    rownames(imp)[order(imp$Overall, decreasing=TRUE)]
    

    Output:

    [1] "Sex"     "age"     "Subject"
    

    So the two most important variables, according to these scores, are:

    rownames(imp)[order(imp$Overall, decreasing=TRUE)[1:2]]
    

    Which gives:

    [1] "Sex"     "age"