I have the following data frame:
library(tidyverse)
df <- tibble(a = c(1, 2, 3, 4, 5),
b = c("Y", "N", "N", "Y", "N"),
c = c("A", "B", "C", "A", "B"))
df <- df %>%
mutate_if(is.character, funs(as.factor))
The output of df
:
a b c
<dbl> <fct> <fct>
1 1 Y A
2 2 N B
3 3 N C
4 4 Y A
5 5 N B
I would like to recode all factor (b
and c
variables) levels to integers: if a factor has only two levels it should be recoded to {0, 1}, otherwise to {1, 2, 3, ...} levels. So the output should be:
a b c
<dbl> <fct> <fct>
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
I can recode variables separately (one by one), but I wonder if there is a more convenient approach.
One dplyr
option could be:
df %>%
mutate(across(where(is.factor),
~ if(n_distinct(.) == 2) factor(., labels = 0:1) else factor(., labels = 1:n_distinct(.))))
a b c
<dbl> <fct> <fct>
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2