Search code examples
rdplyrcase

Applying case_when to dataframe based on list of functions in another dataframe


I am trying to apply case_when to selected columns in dataframe_1 based on a list of case_when functions in another dataframe.

For example, using the mtcars dataframe:

mpg cyl
21 6
21 6
22.8 4

I would like to use the below dataframe (dataframe_2) to mutate the columns.

dataframe_2 = data.frame(variable = c(c("mpg", "cyl")), function_to_use = c('case_when(mpg == 21 ~ "A", mpg == 22.8 ~ "B"', 'case_when(cyl == 6 ~ "X", cyl == 4 ~ "Y"'))

variable function_to_use
mpg case_when(mpg == 21 ~ "A", mpg == 22.8 ~ "B"
cyl case_when(cyl == 6 ~ "X", cyl == 4 ~ "Y"

The expected outcome would be:

mpg cyl
A X
A X
B Y

I would like to accomplish this dynamically using dataframe_2.

I have tried the below code.

mtcars %>% mutate(across(dataframe_2$variable ~ !!!parse_exprs(dataframe_2$function_to_use)))

I have noticed that the below code works, but does not change the original columns.

a <- ‘case_when(mpg == 21 ~ "A", mpg == 22.8 ~ "B"; case_when(cyl == 6 ~ "X", cyl == 4 ~ "Y"‘
mtcars %>% mutate(across(c("mpg", "cyl") ~ !!!parse_exprs(a)))

Edit: extension to the problem.

My dataframes are actually dataframes in a list:

dataframe_1 <- list(dataframe_1a = mtcars[1:3, c("mpg", "cyl")], dataframe_1b = mtcars[4:6, c("mpg", "cyl")])
dataframe_2 <- list(dataframe_2a = data.frame(variable = c(c("mpg", "cyl")), function_to_use = c('case_when(mpg == 21 ~ "A", mpg == 22.8 ~ "B"', 
                                             'case_when(cyl == 6 ~ "X", cyl == 4 ~ "Y"')), 
dataframe_2b = data.frame(variable = c(c("mpg", "cyl")), function_to_use = c('case_when(mpg > 21 ~ "A", TRUE ~ "B"',
'case_when(cyl >7 ~ "X", TRUE ~ 4 ~ "Y"'))

How would I apply each function to their respective dataframe pairs? I've tried using map2 but with no success. Thank you for your help!


Solution

  • Using purrr::reduce2 you could loop over the rows of dataframe_2 like so:

    Note: Your case_when statements missed the closing parenthesis.

    library(purrr)
    library(dplyr, warn = FALSE)
    
    dataframe_1 <- mtcars[1:3, c("mpg", "cyl")]
    
    dataframe_2 <- data.frame(
      variable = c(c("mpg", "cyl")),
      function_to_use = c(
        'case_when(mpg == 21 ~ "A", mpg == 22.8 ~ "B")',
        'case_when(cyl == 6 ~ "X", cyl == 4 ~ "Y")'
      )
    )
    
    dataframe_1 |>
      purrr::reduce2(
        dataframe_2$variable, dataframe_2$function_to_use,
        \(x, y, z) mutate(x, across(all_of(y), ~ !!!rlang::parse_exprs(z))),
        .init = _
      )
    #>               mpg cyl
    #> Mazda RX4       A   X
    #> Mazda RX4 Wag   A   X
    #> Datsun 710      B   Y
    

    EDIT For your more general problem you could wrap the code in a function, then use purrr::map2 to loop over the lists of data frames:

    library(purrr)
    library(dplyr, warn = FALSE)
    library(rlang)
    
    myfun <- function(x, y) {
      x |>
        purrr::reduce2(
          y$variable, y$function_to_use,
          \(x, y, z) mutate(x, across(all_of(y), ~ !!!rlang::parse_exprs(z))),
          .init = _
        )
    }
    
    purrr::map2(dataframe_1, dataframe_2, myfun)
    #> $dataframe_1a
    #>               mpg cyl
    #> Mazda RX4       A   X
    #> Mazda RX4 Wag   A   X
    #> Datsun 710      B   Y
    #> 
    #> $dataframe_1b
    #>                   mpg cyl
    #> Hornet 4 Drive      A   Y
    #> Hornet Sportabout   B   X
    #> Valiant             B   Y