Search code examples
rtidyversecase

How to dynamically create case_when() expressions based on variable names?


I have a dataframe with a set of revenue variables and a set of sales variables. I want to create a new set of variables that takes the value of the revenue variable and checks which sales variable it matches, and if it matches, it returns the corresponding year.

Here's a simplified example of what I have and what I'm trying to achieve:

library(tidyverse)
library(rlang)

# Creating an example dataframe
df <- tibble(
  revenues_1 = c(1, 3, 4),
  revenues_2 = c(2, 4, 5),
  sales_2005 = c(1, 2, 3),
  sales_2006 = c(2, 3, 4),
  sales_2007 = c(3, 4, 5)
)

# What I want to do:
# df <- df |> 
#  rowwise() |>
#  mutate(
#    year_1 = case_when(
#      revenues_1 == sales_2005 ~ 2005,
#      revenues_1 == sales_2006 ~ 2006,
#      revenues_1 == sales_2007 ~ 2007,
#      .default = NA
#    ),
#    year_2 = case_when(
#      revenues_2 == sales_2005 ~ 2005,
#      revenues_2 == sales_2006 ~ 2006,
#      revenues_2 == sales_2007 ~ 2007,
#      .default = NA
#    )
#  )

However, this is not scalable if I have many years and many revenue variables. How can I make this more efficient?

One idea is (not working):

# Create a list of expressions for each year
years_exprs <- map(2005:2007, ~parse_expr(paste0("var == 'sales_", .x, "' ~ ", .x)))

# Use this list in a case_when statement
df <- df |> 
  rowwise() |>
  mutate(across(starts_with("revenues_"), function(var) {
    case_when(
      !!!years_exprs,
      .default = NA
    )
  }, .names = "year_{.col}")) |> 
  ungroup()

Solution

  • If I understand you correctly, using your existing code you could use dplyr's mutate, across and starts_with, then place the function in a named list to create the new variables.

    In order to reduce the effort to create individual case_when logics for multiple years, you could first use rlang::parse_exprs() with paste0:

    yrs <- 2005:2007
    years_exprs <- rlang::parse_exprs(paste0(".x == sales_", yrs, " ~ ", yrs))
    

    Then evaluate it using !!!:

    library(dplyr)
    
    df %>% mutate(across(starts_with("revenues_"), 
                         list(year = ~case_when(!!!years_exprs))))
    

    output:

      revenues_1 revenues_2 sales_2005 sales_2006 sales_2007 revenues_1_year revenues_2_year
           <dbl>      <dbl>      <dbl>      <dbl>      <dbl>           <dbl>           <dbl>
    1          1          2          1          2          3            2005            2006
    2          3          4          2          3          4            2006            2007
    3          4          5          3          4          5            2006            2007
    

    Note: if your variables don't necessarily all start with "revenues", you could create a vector of desired columns regardless if they share the same pattern (here, mut_vars), then use all_of():

    mut_vars <- c("revenues_1", "revenues_2")
    df %>% mutate(across(all_of(mut_vars), 
                         list(year = ~case_when(!!!years_exprs))))