I have the following data:
test <- tibble(id = 1:4, name = letters[1:4], revenue = seq(100, 400, 100))
From this, I create lagged variables:
test_lagged <-
test %>%
mutate(
revenue_lag_1 = lag(revenue, 1),
revenue_lag_2 = lag(revenue, 2),
revenue_lag_3 = lag(revenue, 3),
across(
contains("revenue"),
~replace_na(., 0)
)
)
I already have an approach that helps me do the above step dynamically, so that I can specify the number om lags and lagged variables in a function. However, the next step, I have only managed to implement explicitly (by mapping each column explicitly):
test_lagged %>%
mutate(
any_previous_zero_revenue = case_when(
revenue_lag_1 == 0 |
revenue_lag_2 == 0 |
revenue_lag_3 == 0 ~ TRUE,
TRUE ~ FALSE
)
)
What I want is to be able to apply this dynamically in a case_when function by using something like "any_of", "all_of", contains(), starts_with() or similar.
First, I want to say that for all columns that contains/starts with "revenue_", check if "any_of" these variables (for a given line) == 0. That way I can create dynamically as many lagged variables as I want to without having to change the case_when function.
Here a solution
library(tidyverse)
test_lagged %>%
mutate(
any_previous_zero_revenue = if_else(if_any(.cols = starts_with("revenue_lag"),`==`,0),TRUE,FALSE)
)
# A tibble: 4 x 7
id name revenue revenue_lag_1 revenue_lag_2 revenue_lag_3 any_previous_zero_revenue
<int> <chr> <dbl> <dbl> <dbl> <dbl> <lgl>
1 1 a 100 0 0 0 TRUE
2 2 b 200 100 0 0 TRUE
3 3 c 300 200 100 0 TRUE
4 4 d 400 300 200 100 FALSE