My second question at stack overflow so all tips are welcome :)
For clinical research I have to recode many dichotomous baseline characteristics that have several variations of "yes" and "no" in it.
Currently i am recoding these variables one by one but it takes many lines of code and the variations are quite similar among all the different variables. In case of unknown or NA i want to recode to 0.
example
library(dplyr)
A <- c("Yes", "y", "no", "n", "UK")
B <- c("yes", "Yes", "y", "no", "no")
C <- c("Y", "y", "n", "no", "uk")
#attempt 1 was to recode all variables one by one
A <- recode(A, "Yes" = "yes", "y" = "yes", "n" = "no", "UK" = "no")
B <- recode (B, "Yes" = "yes", "y" = "yes")
C <- recode(C, "Y" = "yes", "y" = "yes", "n" = "no", "uk" = "no")
#attempt 2 was to use a list option on all vectors.
levels(A) <- list("yes"=c("Likely", "y", "Y", "Yes", "yes"), "no" = c("", "No", "UK", "no", "N", "n"))
I was wondering if there is a way could perform this list option on a list/vector that encompasses all A, B, C? Or maybe there is another way that i could recode these variables that is easier and more efficient?
Any help would be great :)
If the vectors are of same length you can put them in dataframe or if they are of different length put them in a list and then use lapply
to apply the same function for all of them. You can use forcats::fct_collapse
to collapse multiple levels into one.
list_vec <- list(A, B, C)
list_vec <- lapply(list_vec, function(x) forcats::fct_collapse(x,
"yes"=c("Likely", "y", "Y", "Yes", "yes"),
"no" = c("", "No", "UK", "no", "N", "n", "uk")))