Search code examples
rstartswith

How to relevel a factor variable with over 500 levels efficiently in R


I haven't been able to find any answers to this specific problem:

I have a factor variable with over 500 levels, that I need to relevel to just 2 levels (1/0.)

Many of the levels start with the same character string e.g. "Woman's mother or sister:"

Is there a way to use starts_with to relevel all of these levels at the same time, instead of doing one by one as I have been doing with this code:

   levels(DF1$MedicalCondition)[levels(DF1$MedicalCondition) == "Woman's mother or sister: sister"] <- "1"

Any help appreciated, thank you!


Solution

  • tidyselect::starts_with is specifically written for use on column names within dplyr-type functions, but you can use the base R startsWith:

    levels(DF1$MedicalCondition)[
      startsWith(levels(DF1$MedicalCondition), "Woman's mother or sister")
    ] <- "1"
    

    You can also use general regex patterns with grepl or stringr::str_detect, which can be very powerful.