Search code examples
rdplyrcase

Mutate across columns case_when to make a new "Flag" column


I have a huge dataset (df) containing numerous columns listing information on number of people on treatment x (e.g. Treat_1) per country (ISO3) and per year (Year).

I need to generate a new column that is called "Flag_Treat_X" (where X stands for the original column name e.g Treat_1). In this column I need to record any treatment decline compared to the previous year as "down", increases as "up" and NA where there is missing data. All columns containing treatment information have "_" in their names, but the actual df uses complex names like HIV_treatment, etc instead of a combination of string and numbers as used here. I know several ways to do it column by column (see below) but the dataset is so huge and the variable names changes often that I need an automated way of doing this.

Here is the code for a simplified example:

ISO3 <- c("AFG", "AFG","AFG", "BEN", "BEN","BEN", "GIN", "GIN", "GIN", "ZWE", "ZWE", "ZWE")
Year <- c(2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022)
Treat_1 <- c(100, 110, 120, 300, 330, 360, 200, 220, 100, 300, NA, 320)

df = data.frame(ISO3, Year, Treat_1)

Here is an example of how I could do it column by column by hand

df_poutcome %>%
   group_by(ISO3) %>%
   mutate(Target_art_n_pf = case_when(Treat_1<lag(Treat_1) ~ "down",
                                      Treat_1>lag(Treat_1) ~ "up", 
                                      TRUE ~ as.character(NA)))

Here is an example that "nearly" works using a loop (not very elegant)

for (i in grep("_", names(temp), value=TRUE)){
 
  varname = ((gsub(" ", "", paste("Flag_",i))))
  temp = temp %>%
       group_by(ISO3) %>%
       mutate(!!varname:= case_when(i<lag(i) ~ "down",
                                   i>lag(i) ~ "up"))

}

The error is that I only get NA values for all the new "Flag_[...]" variables and the code

TRUE ~ as.character(NA)

does not run in the case_when code so I have to remove it.

The desired outcome should look as follows:

ISO3 Year Treat_1 Flag_Treat_1
AFG 2020 100 NA
AFG 2021 110 up
AFG 2022 120 up
BEN 2020 300 NA
BEN 2021 330 up
BEN 2022 360 up
GIN 2020 200 NA
GIN 2021 220 up
GIN 2022 100 down
ZWE 2020 300 NA
ZWE 2021 NA NA
ZWE 2022 320 NA

Thank in advance for any help.


Solution

  • Thank you so much to everyone for the quick responses. I ended up using a blend of the responses to make it run on my side. Code below.

    
    df %>%
      group_by(ISO3) %>%
      mutate(across(contains("_"),
                    ~case_when(
                      .x < lag(.x) ~ "down",
                      .x > lag(.x) ~ "up", 
                      TRUE ~ as.character(NA)
                    ),
                    .names = "flag_{.col}"
      ))
    

    I had to change the \(x) suggested by Gregor Thomas to the code suggested by jdobres ~case_when( .x < lag(.x) ~ "down", .x > lag(.x) ~ "up", TRUE ~ as.character(NA) ) as it was giving me an unexpected symbol