Search code examples
sparse-matrixtidymodelsr-recipes

How do i add blueprint into workflow_set in tidymodels


I tried to follow the examples in the

Link 1 - Sparse Matrix https://www.tidyverse.org/blog/2020/11/tidymodels-sparse-support/

Link 2 - Workflow_sets https://www.tmwr.org/workflow-sets.html

I had trouble including the blue print into the workflow sets.

In the examples where workflow_set is defined in link 2

no_pre_proc <- 
   workflow_set(
      preproc = list(simple = model_vars), 
      models = list(MARS = mars_spec, CART = cart_spec, CART_bagged = bag_cart_spec,
                    RF = rf_spec, boosting = xgb_spec, Cubist = cubist_spec)
   )

and the way we add blue print into the workflow in link 1

wf_sparse <- 
  workflow() %>%
  add_recipe(text_rec, blueprint = sparse_bp) %>%
  add_model(lasso_spec)
  
wf_default <- 
  workflow() %>%
  add_recipe(text_rec) %>%
  add_model(lasso_spec)

Where and how do I add the "blueprint = sparse_bp" option in the workflow_set above?

My attempts were

no_pre_proc <- 
   workflow_set(
      preproc = list(simple = model_vars), 
      models = list(MARS = mars_spec, CART = cart_spec, CART_bagged = bag_cart_spec,
                    RF = rf_spec, boosting = xgb_spec, Cubist = cubist_spec)) %>% 
  option_add(update_blueprint(blueprint = sparse_bp))

Running the racing tune gave me this error

Error: Problem with `mutate()` column `option`.
i `option = purrr::map(option, append_options, dots)`.
x All options should be named.
Run `rlang::last_error()` to see where the error occurred

<error/rlang_error>
There were 9 workflows that had no results.
Backtrace:
 1. ggplot2::autoplot(...)
 2. workflowsets:::autoplot.workflow_set(...)
 3. workflowsets:::rank_plot(...)
 4. workflowsets:::pick_metric(object, rank_metric, metric)
 6. workflowsets:::collect_metrics.workflow_set(x)
 7. workflowsets:::check_incompete(x, fail = TRUE)
 8. workflowsets:::halt(msg)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
There were 9 workflows that had no results.
Backtrace:
    x
 1. +-ggplot2::autoplot(...)
 2. \-workflowsets:::autoplot.workflow_set(...)
 3.   \-workflowsets:::rank_plot(...)
 4.     \-workflowsets:::pick_metric(object, rank_metric, metric)
 5.       +-tune::collect_metrics(x)
 6.       \-workflowsets:::collect_metrics.workflow_set(x)
 7.         \-workflowsets:::check_incompete(x, fail = TRUE)
 8.           \-workflowsets:::halt(msg)
> 

thanks,


Solution

  • Thank you for asking this question; we definitely are not supporting this use case (passing non-default arguments to the recipe or model) very well right now. We've opened an issue here where you can track our work on this.

    In the meantime, you could try a bit of a hacky workaround by manually using update_recipe() on the workflow you are interested in:

    library(tidymodels)
    #> Registered S3 method overwritten by 'tune':
    #>   method                   from   
    #>   required_pkgs.model_spec parsnip
    
    data(parabolic)
    set.seed(1)
    split <- initial_split(parabolic)
    train_set <- training(split)
    test_set <- testing(split)
    
    glmnet_spec <- 
      logistic_reg(penalty = 0.1, mixture = 0) %>%
      set_engine("glmnet")
    
    rec <-
      recipe(class ~ ., data = train_set) %>%
      step_YeoJohnson(all_numeric_predictors())
    
    sparse_bp <- hardhat::default_recipe_blueprint(composition = "dgCMatrix")
    
    wfs_orig <-
      workflow_set(
        preproc = list(yj = rec, 
                       norm = rec %>% step_normalize(all_numeric_predictors())),
        models = list(regularized = glmnet_spec)
      ) 
    
    new_wf <- 
      wfs_orig %>% 
      extract_workflow("yj_regularized") %>% 
      update_recipe(rec, blueprint = sparse_bp)
    

    Created on 2021-12-09 by the reprex package (v2.0.1)

    Then (I know this feels hacky for now) manually take this new_wf and stick it in to the wfs_orig$info[[1]]$workflow slot to replace what is there.