Search code examples
rtidyversetidymodelsr-recipes

Tune recipe in workflow set with custom range (or value)


I'm trying to use workflow_set() function in tidymodels to evaluate a batch of models. I've understand that is possible to modify some model specification in order to change the search range so, for example, given this specification:

spec_lin <- linear_reg( penalty = tune(), 
                    mixture = tune()  ) %>%
set_engine('glmnet')

I can modify the range using:

rec_base <- recipe( price ~ feat_1) %>% 
  step_novel(feat_1) %>% 
  step_other(feat_1,threshold=.2 ) %>%
  step_dummy(feat_1)

rec_adv_param <- rec_base %>% 
  parameters() %>% 
  update ( mixture = mixture(c(0.1,0.01)) )

My attempt is to do the same but with the parameters in the recipe. For example:

rec_tuned <- recipe( price ~ feat_1) %>% 
  step_novel(feat_1) %>% 
  step_other(feat_1,threshold=tune() ) %>%
  step_dummy(feat_1)

followed by

rec_adv_param <- rec_tuned %>% 
  parameters() %>% 
  update ( threshold = threshold(c(0.1,0.2)) )

However when I try to use it in the workflow_set() definition if I use something like

wf_set  <- workflow_set(recipes, models, cross = TRUE ) 
  option_add(param_info = rec_adv_param, id = "rec_tuned_spec_lin") 

The finale "wf_set" lost his original tuning parameters the has been changed with the

threshold = threshold(c(0.1,0.2)

Is there a way to add the parameters specification for the recipe in all workflow_set models?

Thanks


Solution

  • You can add the parameters for a recipe via option_add(), either for a single workflow by id for all workflows if you leave id = NULL. When you go to tune or fit on resampled data, these options will be used.

    For example, if we want to try 0 to 20 PCA components (instead of the default):

    library(tidymodels)
    #> Registered S3 method overwritten by 'tune':
    #>   method                   from   
    #>   required_pkgs.model_spec parsnip
    data(Chicago)
    data("chi_features_set")
    
    time_val_split <-
       sliding_period(
          Chicago,
          date,
          "month",
          lookback = 38,
          assess_stop = 1
       )
    
    ## notice that there are no options; defaults will be used
    chi_features_set
    #> # A workflow set/tibble: 3 × 4
    #>   wflow_id         info             option    result    
    #>   <chr>            <list>           <list>    <list>    
    #> 1 date_lm          <tibble [1 × 4]> <opts[0]> <list [0]>
    #> 2 plus_holidays_lm <tibble [1 × 4]> <opts[0]> <list [0]>
    #> 3 plus_pca_lm      <tibble [1 × 4]> <opts[0]> <list [0]>
    
    ## make new params
    pca_param <-
       parameters(num_comp()) %>%
       update(num_comp = num_comp(c(0, 20)))
    
    ## add new params to workflowset like this:
    chi_features_set %>%
       option_add(param_info = pca_param, id = "plus_pca_lm")
    #> # A workflow set/tibble: 3 × 4
    #>   wflow_id         info             option    result    
    #>   <chr>            <list>           <list>    <list>    
    #> 1 date_lm          <tibble [1 × 4]> <opts[0]> <list [0]>
    #> 2 plus_holidays_lm <tibble [1 × 4]> <opts[0]> <list [0]>
    #> 3 plus_pca_lm      <tibble [1 × 4]> <opts[1]> <list [0]>
    
    ## now these new parameters can be used by `workflow_map()`:
    chi_features_set %>%
       option_add(param_info = pca_param, id = "plus_pca_lm") %>%
       workflow_map(resamples = time_val_split, grid = 21, seed = 1)
    
    #> # A workflow set/tibble: 3 × 4
    #>   wflow_id         info             option    result   
    #>   <chr>            <list>           <list>    <list>   
    #> 1 date_lm          <tibble [1 × 4]> <opts[2]> <rsmp[+]>
    #> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
    #> 3 plus_pca_lm      <tibble [1 × 4]> <opts[3]> <tune[+]>
    

    Created on 2021-07-30 by the reprex package (v2.0.0)