Search code examples
rdataframedplyrswap

Swapping strings or values in grouped data based on condition


The following data frame is grouped by the id variable. For each id on variables X, Y, and Z, I wish to replace "no" with "yes" on the first row if and only if the specific id has "yes" in row(s) other than the first row.

id <- c(1,1,1,2,2,3,3)
X <- c("yes", "no", "no", "no", "no", "no", "no")
Y <- c("no", "no", "yes", "no", "yes", "no", "no")
Z <- c("no", "yes", "no", "no", "no", "no", "no")
df <- data.frame(id, X, Y, Z)

The expected is:

id   X   Y   Z
 1 yes yes yes
 1  no  no  no
 1  no  no  no
 2  no yes  no
 2  no  no  no
 3  no  no  no
 3  no  no  no

I tried using the ifelse function, but encountered difficulties due the groupings. I would like to request help here. Thank you!


Solution

  • Here is a dplyr solution using a case_when:

    We check each group of rows sharing the same id:

    If any row within that group has yes, then the first row of the group is changed to yes. For all subsequent rows of the group, any yes is flipped to no. All other values remain unchanged.

    library(dplyr)
    
    df %>%
      mutate(
        across(X:Z, ~ case_when(
          row_number() == 1 & any(. == "yes") ~ "yes",
          row_number() > 1 & . == "yes" ~ "no",
          .default = .)), .by = id)
    
     id   X   Y   Z
    1  1 yes yes yes
    2  1  no  no  no
    3  1  no  no  no
    4  2  no yes  no
    5  2  no  no  no
    6  3  no  no  no
    7  3  no  no  no