Search code examples
rdplyrcase-when

Is there a simple way to change multiple names to one single name in R?


This is a simplification of my dataframe. The column with colors are characters.

|ID|Color |
|--|------| 
|1 |Brown |
|2 |Black |
|3 |Red   |
|4 |Blue  |
|5 |Black |
|6 |Green |
|7 |Brown |
|8 |Red   |
|9 |Yellow|
|10|Violet|

I would like to replace all colors that are NOT black, brown or red to Other. I have a piece of code that works.

library(tidyverse)
df_clean <- df %>%
   mutate(Color = case_when(
      str_detect(Color, "Red") ~ "Other",
      str_detect(Color, "Blue") ~ "Other",
      str_detect(Color, "Green") ~ "Other",
      str_detect(Color, "Yellow") ~ "Other",
      str_detect(Color, "Violet") ~ "Other",
      TRUE ~ Color
))

But I would have to do this for all colors (my full dataset has more than 50 color names in >160000 data entries). Is there a simpler way to do this? Like maybe negate() or use ! in the code somewhere? Like say if its not black, brown or red change to Other?


Solution

  • You can replace the colors using %in%

    df$Color[!df$Color %in% c('Black', 'Brown', 'Red')] <- 'Other'
    

    Can also use fct_other from forcats.

    library(dplyr)
    library(forcats)
    
    df %>% mutate(Color = fct_other(Color, c('Black', 'Brown', 'Red')))