Search code examples
rtidymodelsr-recipes

step mutate fixed value to a list of variables in tidymodels


I wonder if could be possible to mutate variables inside my recipe taking a list of variables and imputing a fixed value (-12345) when NA is found.

No success so far.

my_list <- c("impute1", "impute2", "impute3")

recipe <- 
  recipes::recipe(target ~ ., data = data_train) %>%
  recipes::step_naomit(everything(), skip = TRUE) %>% 
  recipes::step_rm(c(v1, v2, id, id2 )) %>%
  recipes::step_mutate_at(my_list, if_else(is.na(.), -12345, . ))

Error in step_mutate_at_new(terms = ellipse_check(...), fn = fn, trained = trained, : argument "fn" is missing, with no default


Solution

  • You were on the right track. A couple of notes. to make recipes::step_mutate_at() work you need 2 things. A selection of variables to be transformed and 1 or more functions to apply to that selection. The functions should be passed to the fn argument either as a function, named or anonymous, or a named list of functions.

    Setting fn = ~if_else(is.na(.), -12345, . ) in step_mutate_at() should fix your problem, using the ~fun(.) lambda style. Furthermore i used all_of(my_list) instead of my_list to avoid ambiguous selection by using external vectors reference.

    Lastly using step_naomit() removes the observations with missing values during baking which might be undesirable since you are imputing the missing values.

    library(recipes)
    
    mtcars1 <- mtcars
    mtcars1[1, 1:3] <- NA
    
    my_list <- c("mpg", "cyl", "disp")
    
    recipe <- 
      recipe(drat ~ ., data = mtcars1) %>%
      step_mutate_at(all_of(my_list), fn = ~if_else(is.na(.), -12345, . ))
    
    recipe %>%
      prep() %>%
      bake(new_data = NULL)
    #> # A tibble: 32 x 11
    #>         mpg    cyl    disp    hp    wt  qsec    vs    am  gear  carb  drat
    #>       <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #>  1 -12345   -12345 -12345    110  2.62  16.5     0     1     4     4  3.9 
    #>  2     21        6    160    110  2.88  17.0     0     1     4     4  3.9 
    #>  3     22.8      4    108     93  2.32  18.6     1     1     4     1  3.85
    #>  4     21.4      6    258    110  3.22  19.4     1     0     3     1  3.08
    #>  5     18.7      8    360    175  3.44  17.0     0     0     3     2  3.15
    #>  6     18.1      6    225    105  3.46  20.2     1     0     3     1  2.76
    #>  7     14.3      8    360    245  3.57  15.8     0     0     3     4  3.21
    #>  8     24.4      4    147.    62  3.19  20       1     0     4     2  3.69
    #>  9     22.8      4    141.    95  3.15  22.9     1     0     4     2  3.92
    #> 10     19.2      6    168.   123  3.44  18.3     1     0     4     4  3.92
    #> # … with 22 more rows
    

    Created on 2021-06-21 by the reprex package (v2.0.0)