How to tune a MLP model with more than 1 hidden layer within the tidymodels framework?

I am building a multilayer perceptron (mlp) model with 2 or 3 hidden layers using the brulee package within the tidymodels framework. I wonder how to tune the hyper-parameters including the number of hidden layers, number of hidden_units per layer, and penalty using tune_grid()?

library(tidymodels)
library(brulee)
data(Sacramento, package = "modeldata")

# Data splitting
set.seed(123)
data_split <- initial_split(Sacramento, prop = 0.75, strata = price)
Sac_train <- training(data_split)
Sac_test <- testing(data_split)

# Create the recipe
Sac_recipe <- recipe(price ~ ., data = Sac_train) %>% 
  step_rm(zip, latitude, longitude) %>% 
  step_corr(all_numeric_predictors(), threshold = 0.85) %>% 
  step_normalize(all_numeric_predictors()) %>%
  step_dummy(all_nominal_predictors())

A mlp model with 2 hidden layers (each has 30 and 20 hidden units) can be specified below:

# Build the model
mlp_mod <- mlp(hidden_units = c(30, 20), penalty = tune()) %>% 
           set_engine("brulee", importance = "permutation") %>% 
           set_mode("regression")

I wonder how to tune the number of hidden layers and number of hidden_units per layer together using tune_grid()? If using hidden_units = tune(), it will only tune the number of hidden_units for a single hidden layer mlp. Thanks.

Solution

You can do this by generating your own grid for tuning the hyperparameters where the hidden_units column is made up of vectors of doubles. The length of the vector is the number of hidden layers, and the values are the number of hidden units.

For example, this could look like:

grid_hidden_units <- tribble(
    ~hidden_units,
    c(8, 8),
    c(8, 8, 8),
    c(16, 16),
    c(16, 16, 16),
)

grid_penalty <- tibble(penalty = c(0.01, 0.02))

grid <- grid_hidden_units |>
    crossing(grid_penalty)

grid
# # A tibble: 12 × 2
#    hidden_units penalty
#    <list>         <dbl>
# 1 <dbl [2]>       0.01
# 2 <dbl [2]>       0.02
# 3 <dbl [3]>       0.01
# 4 <dbl [3]>       0.02 
# 5 <dbl [2]>       0.01
# 6 <dbl [2]>       0.02 
# 7 <dbl [3]>       0.01
# 8 <dbl [3]>       0.02

You can select any values you want for tuning here - try vectors of different lengths and values.

With this grid, we can easily tune the model with tune_grid! This might look like:

mlp_mod <- mlp(hidden_units = tune(), penalty = tune()) |>
    set_engine("brulee", importance = "permutation") |>
    set_mode("regression")

mlp_workflow <- workflow() |>
    add_recipe(Sac_recipe) |>
    add_model(mlp_mod)

mlp_tune <- tune_grid(
        mlp_workflow,
        resamples = vfold_cv(Sac_train, v = 2),
        grid = grid
    )

Then, you can inspect the results by looking at mlp_tune$.metrics. It is also easy to extract the best model:

mlp_best <- mlp_tune |> select_best("rmse")

mlp_best
# # A tibble: 1 × 3
#   hidden_units penalty .config             
#   <list>         <dbl> <chr>               
# 1 <dbl [3]>       0.01 Preprocessor1_Model3

mlp_best$hidden_units
# [[1]]
# [1] 8 8 8

In this example, our best hyperparameters includes 3 hidden layers, 8 hidden units in each hidden layer, and a penalty of 0.01. This successfully tunes the number of hidden layers and the number of hidden units!

In order to update the model, I found that it is easiest to enter the hyperparameters as a list - brulee doesn't like if you use a tibble where one column is a list, which happens when we use multiple hidden layers. So you can do:

# Turn best hyperparameters into list
mlp_best_list <- mlp_best |> as.list()
mlp_best_list$hidden_units <- mlp_best_list$hidden_units |> unlist()

# Update the workflow with the best hyperparameters
mlp_workflow_final <- mlp_workflow |>
    finalize_workflow(mlp_best_list)

Now you can fit your model and make predictions as normal.

# Fit the model
mlp_fit <- fit(mlp_workflow_final, data = Sac_train)

# Make predictions
mlp_preds <- predict(mlp_fit, Sac_test)