Search code examples
rdataframetidyversefeature-selectiontidymodels

How to Get Variable/Feature Importance From Tidymodels ranger object?


I have a ranger object from the tidymodels rand_forest function:

rf <- rand_forest(mode = "regression", trees = 1000) %>% fit(pay_rate ~ age+profession)

I want to get the feature importance of each variable (I have many more than in this example). I've tried things like rf$variable.importance, or importance(rf), but the former returns NULL and the latter function doesn't exist. I tried using the vip package, but that doesn't work for a ranger object. How can I extract feature importances from this object?


Solution

  • You need to add importance = "impurity" when you set the engine for ranger. This will provide variable importance scores. Once this is set, you can use extract_fit_parsnip with vip to plot the variable importance.

    small example:

    library(tidymodels)
    library(vip)
    
    rf_mod <- rand_forest(mode = "regression", trees = 100) %>% 
      set_engine("ranger", importance = "impurity")
      
    rf_recipe <- 
      recipe(mpg ~ ., data = mtcars) 
    
    rf_workflow <- 
      workflow() %>% 
      add_model(rf_mod) %>% 
      add_recipe(rf_recipe)
    
    
    rf_workflow %>% 
      fit(mtcars) %>% 
      extract_fit_parsnip() %>% 
      vip(num_features = 10)
    

    More information is available in the tidymodels get started guide