Search code examples
rdplyrtidymodels

Dplyr: case_when - how to use to select a column?


I would like to use case_when from dplyr in order to select a column to change its role for a tidymodels recipe.

What am I doing wrong? In the following MWE an ID-role should be assigned to the column "b":

library(tidyverse)
library(tidymodels)

# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)

# filter variable
col_name = "foo"

rec <- recipe(a ~., data = df) %>%
  update_role(
              case_when(
                col_name == "foo" ~ b, # Not working too: .$b, df$b
                col_name == "foo2" ~ c), 
              new_role = "ID")
rec

Solution

  • Unfortunately case_when is not meant for the kind of dynamic variable selection you are trying to achieve. Instead I would suggest to make use of an if (...) wrapped inside a function to perform the dynamic selection:

    library(tidyverse)
    library(tidymodels)
    
    # dummy data
    a = seq(1:3)
    b = seq(4:6)
    c = seq(7:9)
    df <- data.frame(a,b,c)
    
    # filter variable
    col_name = "foo"
    
    update_select <- function(recipe, col_name) {
      if (col_name == "foo") {
        update_role(recipe, b, new_role = "ID") 
      } else if (col_name == "foo2") {
        update_role(recipe, c, new_role = "ID")  
      }
    }
    
    rec <- recipe(a ~., data = df) %>%
      update_select(col_name)
    rec
    #> Data Recipe
    #> 
    #> Inputs:
    #> 
    #>       role #variables
    #>         ID          1
    #>    outcome          1
    #>  predictor          1