I have this following dataframe representing responses from five individuals. There is four different responses Yes, No, Inc, Vag.
data_test = data.frame(Val_Av=c("Yes", "No", "Inc", "Yes", "No", "Yes", "No", "Inc", "Vag", "Yes"),
Val_Am=c("No", "No", "No", "Inc", "No", "Yes", "Yes", "Inc", "Vag", NA),
Val_ZM=c(NA, NA, NA, "Yes", "No", NA, "No", "Inc", "Vag", "Yes"),
Val_FC=c("No", "No", "No", NA, "No", "Yes", "Yes", "Yes", "Inc", "No"),
Val_CL=c("Yes", "No", "Inc", "Yes", "No", "Yes", "No", NA, NA, "Yes"))
Val_Av Val_Am Val_ZM Val_FC Val_CL
1 Yes No <NA> No Yes
2 No No <NA> No No
3 Inc No <NA> No Inc
4 Yes Inc Yes <NA> Yes
5 No No No No No
6 Yes Yes <NA> Yes Yes
7 No Yes No Yes No
8 Inc Inc Inc Yes <NA>
9 Vag Vag Vag Inc <NA>
10 Yes <NA> Yes No Yes
I would like to create another variable summarising the responses following several rules
If the response is the same in all the variables write the response (ex: line2 -> No, line5-> No, line6 -> Yes)
If there is no Yes in the response, concat all the unique values (ex: line3 -> Inc;No, line 9 -> Vag;Inc)
If there is a Yes in the responses AND
- if there is strictly more Yes than other responses, write Yes (ex line 10 -> Yes, line4 -> Yes)
- if there as much Yes as other responses, write "Dif" (Ex line 1-> Dif)
- if there is strictly less Yes than other responses, concat all the unique value (Ex line8 -> Inc;Yes, line7->No;Yes)
In my real data, I have other variable that I don't want to take into account. Therefore I would like to find a way using across and start_with("Val_") functions. I have no idea of how to proceed. I would appreciate any idea. Thanks
EDIT:
Using Friede code I tried to write a more simple code, but my trial is not working. Please see below what I have tried
data_test %>%
mutate(Val=
across(starts_with('Val_'),
function(x) case_when(
length(unique(x[!is.na(x)])) == 1 ~ unique(x[!is.na(x)]),
all(x != "Yes", na.rm = TRUE) ~ paste(unique(x[!is.na(x)]), collapse = ";"),
any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm = TRUE) == sum(x != "Yes", na.rm = TRUE) ~ "Dif",
any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm = TRUE) > sum(x != "Yes", na.rm = TRUE) ~ "Yes",
any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm=TRUE) < sum(x !="Yes", na.rm=TRUE) ~ paste(unique(x[!is.na(x)]), collapse = ";")
)
)
)
I know how to use the ifelse function (and I think I can manage with the case_when) but I don't know how to write the conditions.
Using base R I have explicitly rewritten your statements with apply
and a nested ifelse
statement. This is not the recommended approach in such cases.
apply(data_test, 1L, \(x)
ifelse(length(unique(x[!is.na(x)])) == 1L,
unique(x[!is.na(x)]),
ifelse(all(x != "Yes", na.rm = TRUE),
paste(unique(x[!is.na(x)]), collapse = ";"),
ifelse(any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm = TRUE) == sum(x != "Yes", na.rm = TRUE),
"Dif",
ifelse(any(x == "Yes", na.rm = TRUE) & sum(x == "Yes", na.rm = TRUE) > sum(x != "Yes", na.rm = TRUE),
"Yes",
paste(unique(x[!is.na(x)]), collapse = ";"))))))
#> [1] "Dif" "No" "Inc;No" "Yes" "No" "Yes" "No;Yes"
#> [8] "Inc;Yes" "Vag;Inc" "Yes"
Created on 2023-12-05 with reprex v2.0.2
Does this get you started to write a more elegant and less redundant approach?
Havn't checked. If your statements do not cover every possibility, it is good to write another ifelse
including a default, say, a X
-flag. Notice that, dplyr::case_when()
has a default option.
Data
data_test = data.frame(Val_Av=c("Yes", "No", "Inc", "Yes", "No", "Yes", "No", "Inc", "Vag", "Yes"),
Val_Am=c("No", "No", "No", "Inc", "No", "Yes", "Yes", "Inc", "Vag", NA),
Val_ZM=c(NA, NA, NA, "Yes", "No", NA, "No", "Inc", "Vag", "Yes"),
Val_FC=c("No", "No", "No", NA, "No", "Yes", "Yes", "Yes", "Inc", "No"),
Val_CL=c("Yes", "No", "Inc", "Yes", "No", "Yes", "No", NA, NA, "Yes"))