Search code examples
rrefactoringforcats

fct_collapse function to multiple columns at once


I got a dataset where yes / no variables all have been entered as free text (facepalm).

At first I tried to apply the fct_collapse function to every individual column in the dataframe, but this takes a lot of coding considering there are 50+ columns with yes and no.

pid = c(1,2,3,4,5)
a = c("y", "Y", "no", "no", "NO")
b = c("yes", "Y", "y", "no", "n")
c = c("Y", "no", "n", "no", "No")
df <- data.frame(a,b,c)

I tried

df$a <- fct_collapse(df$a, yes = c("y", "Y"), no = c("no", "NO")

But I guess this will take many lines of code. Is it possible to do it with one line of code with an apply function or mutate in combination with across?

EDIT: the output I am looking for is

a2 = c("yes", "yes", "no", "no", "no")
b2 = c("yes", "yes", "yes", "no", "no")
c2 = c("yes", "no", "no", "no", "no")
df2 <- data.frame(pid,a2,b2,c2)

Solution

  • We can use across to loop over the columns

    library(dplyr)
    library(forcats)
    df %>% 
        mutate(across(-pid, ~ fct_collapse(.,
         yes = c('y', 'Y'), no = c('no', 'NO', 'n'))))
    

    -output

    #   pid   a   b   c
    #1   1 yes yes yes
    #2   2 yes yes  no
    #3   3  no yes  no
    #4   4  no  no  no
    #5   5  no  no  No