I got a dataset where yes / no variables all have been entered as free text (facepalm).
At first I tried to apply the fct_collapse function to every individual column in the dataframe, but this takes a lot of coding considering there are 50+ columns with yes and no.
pid = c(1,2,3,4,5)
a = c("y", "Y", "no", "no", "NO")
b = c("yes", "Y", "y", "no", "n")
c = c("Y", "no", "n", "no", "No")
df <- data.frame(a,b,c)
I tried
df$a <- fct_collapse(df$a, yes = c("y", "Y"), no = c("no", "NO")
But I guess this will take many lines of code. Is it possible to do it with one line of code with an apply function or mutate in combination with across?
EDIT: the output I am looking for is
a2 = c("yes", "yes", "no", "no", "no")
b2 = c("yes", "yes", "yes", "no", "no")
c2 = c("yes", "no", "no", "no", "no")
df2 <- data.frame(pid,a2,b2,c2)
We can use across
to loop over the columns
library(dplyr)
library(forcats)
df %>%
mutate(across(-pid, ~ fct_collapse(.,
yes = c('y', 'Y'), no = c('no', 'NO', 'n'))))
-output
# pid a b c
#1 1 yes yes yes
#2 2 yes yes no
#3 3 no yes no
#4 4 no no no
#5 5 no no No