I have a column with almost a 100 string categories that I would like to group/recode into fewer categories. I am trying to figure out the easiest way to do so, I thought about turning it into factor or numeric to make it easier to make operations. They are not in any particular order, but I can't seem to find the best way to recode it. Here is an example:
Suppose I have 15 string categories:
cat1 <- LETTERS[seq(1,15)]
df <- as.data.frame(cat1)
I turned it into numeric:
df$cat2 <- as.numeric(as.factor(df$cat1))
This is what I tried to do:
df <- df %>% mutate(cat3 = case_when(cat2 == c(1:5,7,9) ~ 1,
cat2 == c(6,8,10,13) ~ 2,
cat2 == (11:12,14:15) ~ 3))
Or I even tried:
df$cat3[df$cat2 == c(1:5, 7,9)] <- 1
I tried other codes, but they don't seem to work. Suppose I want to group the following new categories:
(1:5, 7,9) (6,8,10,13) (11:12,14:15)
What is the best way to do it?
Your case_when
syntax needs a little tweak to make it work:
df %>% mutate(cat3 = case_when(cat2 %in% c(1:5, 7, 9) ~ 1,
cat2 %in% c(6,8,10,13) ~ 2,
cat2 %in% c(11:12,14:15) ~ 3))
But you can also use the one vector version, case_match
:
df %>% mutate(cat3 = case_match(cat2,
c(1:5, 7, 9) ~ 1,
c(6,8,10,13) ~ 2,
c(11:12,14:15) ~ 3))