Search code examples
rtestingmutate

Mutate a new variable using several conditions based on different variables


I have this following dataframe representing responses from five individuals. There is four different responses Yes, No, Inc, Vag.

data_test = data.frame(Val_Av=c("Yes", "No", "Inc", "Yes", "No", "Yes", "No", "Inc", "Vag", "Yes"),
           Val_Am=c("No", "No", "No", "Inc", "No", "Yes", "Yes", "Inc", "Vag", NA),
           Val_ZM=c(NA, NA, NA, "Yes", "No", NA, "No", "Inc", "Vag", "Yes"),
           Val_FC=c("No", "No", "No", NA, "No", "Yes", "Yes", "Yes", "Inc", "No"),
           Val_CL=c("Yes", "No", "Inc", "Yes", "No", "Yes", "No", NA, NA, "Yes"))

  Val_Av Val_Am Val_ZM Val_FC Val_CL
1     Yes     No   <NA>     No    Yes
2      No     No   <NA>     No     No
3     Inc     No   <NA>     No    Inc
4     Yes    Inc    Yes   <NA>    Yes
5      No     No     No     No     No
6     Yes    Yes   <NA>    Yes    Yes
7      No    Yes     No    Yes     No
8     Inc    Inc    Inc    Yes   <NA>
9     Vag    Vag    Vag    Inc   <NA>
10    Yes   <NA>    Yes     No    Yes

I would like to create another variable summarising the responses following several rules

  • If the response is the same in all the variables write the response (ex: line2 -> No, line5-> No, line6 -> Yes)

  • If there is no Yes in the response, concat all the unique values (ex: line3 -> Inc;No, line 9 -> Vag;Inc)

  • If there is a Yes in the responses AND

       - if there is strictly more Yes than other responses, write Yes (ex line 10 -> Yes, line4 -> Yes)
       - if there as much Yes as other responses, write "Dif" (Ex line 1-> Dif)
       - if there is strictly less Yes than other responses, concat all the unique value (Ex line8 -> Inc;Yes, line7->No;Yes)
    

In my real data, I have other variable that I don't want to take into account. Therefore I would like to find a way using across and start_with("Val_") functions. I have no idea of how to proceed. I would appreciate any idea. Thanks

EDIT:

Using Friede code I tried to write a more simple code, but my trial is not working. Please see below what I have tried

data_test %>%
  mutate(Val=
    across(starts_with('Val_'), 
           function(x) case_when(
             length(unique(x[!is.na(x)])) == 1 ~  unique(x[!is.na(x)]), 
             all(x != "Yes", na.rm = TRUE)  ~ paste(unique(x[!is.na(x)]), collapse = ";"), 
             any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm = TRUE) == sum(x != "Yes", na.rm = TRUE) ~ "Dif", 
             any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm = TRUE) > sum(x != "Yes", na.rm = TRUE) ~ "Yes",
             any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm=TRUE) < sum(x !="Yes", na.rm=TRUE) ~ paste(unique(x[!is.na(x)]), collapse = ";")
           )
    )
  )

Solution

  • I know how to use the ifelse function (and I think I can manage with the case_when) but I don't know how to write the conditions.

    Using base R I have explicitly rewritten your statements with apply and a nested ifelse statement. This is not the recommended approach in such cases.

    apply(data_test, 1L, \(x) 
          ifelse(length(unique(x[!is.na(x)])) == 1L, 
                 unique(x[!is.na(x)]), 
                 ifelse(all(x != "Yes", na.rm = TRUE), 
                        paste(unique(x[!is.na(x)]), collapse = ";"),
                        ifelse(any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm = TRUE) == sum(x != "Yes", na.rm = TRUE), 
                               "Dif", 
                               ifelse(any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm = TRUE) > sum(x != "Yes", na.rm = TRUE), 
                                      "Yes", 
                                      paste(unique(x[!is.na(x)]), collapse = ";"))))))
    #>  [1] "Dif"     "No"      "Inc;No"  "Yes"     "No"      "Yes"     "No;Yes" 
    #>  [8] "Inc;Yes" "Vag;Inc" "Yes"
    

    Created on 2023-12-05 with reprex v2.0.2

    Does this get you started to write a more elegant and less redundant approach?

    Havn't checked. If your statements do not cover every possibility, it is good to write another ifelse including a default, say, a X-flag. Notice that, dplyr::case_when() has a default option.

    Data

    data_test = data.frame(Val_Av=c("Yes", "No", "Inc", "Yes", "No", "Yes", "No", "Inc", "Vag", "Yes"),
                           Val_Am=c("No", "No", "No", "Inc", "No", "Yes", "Yes", "Inc", "Vag", NA),
                           Val_ZM=c(NA, NA, NA, "Yes", "No", NA, "No", "Inc", "Vag", "Yes"),
                           Val_FC=c("No", "No", "No", NA, "No", "Yes", "Yes", "Yes", "Inc", "No"),
                           Val_CL=c("Yes", "No", "Inc", "Yes", "No", "Yes", "No", NA, NA, "Yes"))