I have the following R dataframe :
id color
001 blue
001 yellow
001 red
002 blue
003 blue
003 yellow
What's the general method to one-hot-encode such a dataframe into the following :
id blue yellow red
001 1 1 1
002 1 0 0
003 1 0 1
Thank you very much.
Try this. You can create a variable for those observations present in data equals to one and then use pivot_wider()
to reshape the values. As you will get NA
for classes not present in data, you can replace it with zero using replace()
. Here the code using tidyverse
functions:
library(dplyr)
library(tidyr)
#Code
dfnew <- df %>% mutate(val=1) %>%
pivot_wider(names_from = color,values_from=val) %>%
replace(is.na(.),0)
Output:
# A tibble: 3 x 4
id blue yellow red
<int> <dbl> <dbl> <dbl>
1 1 1 1 1
2 2 1 0 0
3 3 1 1 0
Some data used:
#Data
df <- structure(list(id = c(1L, 1L, 1L, 2L, 3L, 3L), color = c("blue",
"yellow", "red", "blue", "blue", "yellow")), class = "data.frame", row.names = c(NA,-6L))