Search code examples
rfunctionrefactoringvectorizationforcats

Is there a vectorized method for replacing factor levels in tidyverse


I want to be able to efficiently recode the factor levels of a large number of variables (columns) of a data frame by replacing one of the levels with the name of the variable (column).

Health <- tibble(Anemia = c("yes", "no", "no"), 
BloodPressure = c("no", "yes", "no"),
Asthma = c("no", "no", "yes"))

And I want to the output to look like this

Health2 <- tibble(Anemia = c("Anemia", "no", "no"), 
BloodPressure = c("no", "BloodPressure", "no"), 
Asthmal = c("no", "no", "Asthma"))

I want this output without changing each level by hand because I have a database with 100 or so variables that I have to recode. I tried to create a function to do this

Med_rename <- function(x) {
  levels = c(no = "no", names(x) ="yes")
  fct_recode(x, !!!levels)
}

Med_rename2 <- function(x) {
  y = names(x)
  levels = c(no = "no", y ="yes")
  fct_recode(x, !!!levels)
}

but the output of either of these attempts or others using vectorized attempts to replace the levels does not replace "yes" with variable (column) name. Is there another vectorized way to replace the "yes" with a column name and apply to large set of variables?


Solution

  • You can use cur_column() in dplyr to use the name of the column to replace.

    library(dplyr)
    
    Health %>% mutate(across(.fns = ~replace(., . == 'yes', cur_column())))
    
    #  Anemia BloodPressure Asthma
    #  <chr>  <chr>         <chr> 
    #1 Anemia no            no    
    #2 no     BloodPressure no    
    #3 no     no            Asthma
    

    In base R, with lapply :

    Health[] <- lapply(names(Health), function(x) 
                       replace(Health[[x]], Health[[x]] == 'yes', x))