Search code examples
rmachine-learningneural-networktidymodels

Is there a way to get predictors importance from a neural network created on Tidymodels?


I am currently using Tidymodels to create a supervised classification model. The aim of the model is to predict a binary outcome (variable "broadGroup") using a high number of variables (498). I used several engine and algorithms. For the majority of models, I had no problems getting the variable importance using the package VIP for most models however it does no work for neural network. I would really like to obtain it because the neural network is the best performing model so far (performances were ranked using AUROC).

Here is the code used to create the model :

#creating recipe
ap_recipe <- recipe(broadGroup ~ ., data = ap_train) %>%
  step_normalize(all_numeric()) %>% 
  step_dummy(all_nominal_predictors()) #%>%

#setting validation set
ap_boot <- bootstraps(ap_train, times= 3, strata = broadGroup)

#Creating Neural Network model
nnet_model <- mlp(hidden_units = tune(), 
                  penalty = tune(),
                  epochs = tune(),
                  activation = "relu") %>% 
  set_mode("classification") %>%
  set_engine("keras", verbose = FALSE)

#available parameters for activation: "linear", "softmax", "relu" and "elu"

#creating workflow
nnet_wflow <- workflow() %>% 
  add_recipe(ap_recipe) %>%
  add_model(nnet_model)

#creating tuning grid
nnet_grid <- grid_regular(
  hidden_units(range = c(1,10)),
  penalty(range = c(10e-10,0)),
  epochs(range = c(500, 1000)),
  levels = 5)

#evaluating nnet model with metrics

nnet_res <- 
  nnet_wflow %>% 
  tune_grid(
    resamples = ap_boot,
    grid = nnet_grid,
    control = control_grid(save_pred = TRUE),
    metrics = metric_set(kap, recall, precision, f_meas, accuracy, roc_auc),
    )

nnet_best <- 
  nnet_res %>% 
  select_best("roc_auc")

#showing performance in all folds
nnet_res %>%  collect_metrics(summarize = TRUE)

#collecting predictions
nnet_predict <- nnet_res %>%
  collect_predictions(parameters = nnet_best)

save(nnet_predict,nnet_res,nnet_best, file = "./data/model/apnnet.RData")

#generating confusion matrix
nnet_predict %>% conf_mat(broadGroup, .pred_class) 

My current idea is to create a loop to train a model using only one of the predictor and one model using all predictor minus one for all predictors. Obtain the metrics everytime and then compare the variation of AUROC for each variable. It would however be rather long considering the number of predictor and I am pondering whether it would be correct (because it would create a new model everytime and not the already existing and trained model). Do you know any other way to perform it ?


Solution

  • If you want model agnostic estimates of importance, you can use the DALEXtra package.

    Chapter 18 of the tidymodels book has a good tutorial on that.

    If you want a model-specific importance score, the new version of baguette has a function that works with the nnet package:

    library(tidymodels)
    library(baguette)
    
    
    tidymodels_prefer()
    theme_set(theme_bw())
    options(pillar.advice = FALSE, pillar.min_title_chars = Inf)
    
    
    data(two_class_dat)
    
    nnet_fit <- 
      mlp() %>% 
      set_mode("classification") %>% 
      fit(Class ~ ., data = two_class_dat)
    
    nnet_fit %>% 
      extract_fit_engine() %>% 
      nnet_imp_garson()
    #> # A tibble: 2 × 2
    #>   predictor importance
    #>   <chr>          <dbl>
    #> 1 A               63.9
    #> 2 B               36.1
    

    Created on 2023-04-07 with reprex v2.0.2