I have a dataframe with a set of revenue variables and a set of sales variables. I want to create a new set of variables that takes the value of the revenue variable and checks which sales variable it matches, and if it matches, it returns the corresponding year.
Here's a simplified example of what I have and what I'm trying to achieve:
library(tidyverse)
library(rlang)
# Creating an example dataframe
df <- tibble(
revenues_1 = c(1, 3, 4),
revenues_2 = c(2, 4, 5),
sales_2005 = c(1, 2, 3),
sales_2006 = c(2, 3, 4),
sales_2007 = c(3, 4, 5)
)
# What I want to do:
# df <- df |>
# rowwise() |>
# mutate(
# year_1 = case_when(
# revenues_1 == sales_2005 ~ 2005,
# revenues_1 == sales_2006 ~ 2006,
# revenues_1 == sales_2007 ~ 2007,
# .default = NA
# ),
# year_2 = case_when(
# revenues_2 == sales_2005 ~ 2005,
# revenues_2 == sales_2006 ~ 2006,
# revenues_2 == sales_2007 ~ 2007,
# .default = NA
# )
# )
However, this is not scalable if I have many years and many revenue variables. How can I make this more efficient?
One idea is (not working):
# Create a list of expressions for each year
years_exprs <- map(2005:2007, ~parse_expr(paste0("var == 'sales_", .x, "' ~ ", .x)))
# Use this list in a case_when statement
df <- df |>
rowwise() |>
mutate(across(starts_with("revenues_"), function(var) {
case_when(
!!!years_exprs,
.default = NA
)
}, .names = "year_{.col}")) |>
ungroup()
If I understand you correctly, using your existing code you could use dplyr
's mutate
, across
and starts_with
, then place the function in a named list to create the new variables.
In order to reduce the effort to create individual case_when
logics for multiple years, you could first use rlang::parse_exprs()
with paste0
:
yrs <- 2005:2007
years_exprs <- rlang::parse_exprs(paste0(".x == sales_", yrs, " ~ ", yrs))
Then evaluate it using !!!
:
library(dplyr)
df %>% mutate(across(starts_with("revenues_"),
list(year = ~case_when(!!!years_exprs))))
output:
revenues_1 revenues_2 sales_2005 sales_2006 sales_2007 revenues_1_year revenues_2_year
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 1 2 3 2005 2006
2 3 4 2 3 4 2006 2007
3 4 5 3 4 5 2006 2007
Note: if your variables don't necessarily all start with "revenues", you could create a vector of desired columns regardless if they share the same pattern (here, mut_vars
), then use all_of()
:
mut_vars <- c("revenues_1", "revenues_2")
df %>% mutate(across(all_of(mut_vars),
list(year = ~case_when(!!!years_exprs))))