Search code examples
rdataframereshapespread

Convert categories in one column to multiple columns coded as 1 or 0 if present or absent in R


I have data that looks like the following:

library(dplyr)
library(tidyr)
a <- data_frame(type=c("A", "A", "B", "B", "C", "D"))
print(a)
# A tibble: 6 x 1
type 
<chr>
1 A    
2 A    
3 B    
4 B    
5 C    
6 D

Where type contains categorical information. I am trying to convert each category in type into its own column coded as 1 if a type is present and 0 if not; thus, the final result would look like:

b <- data_frame(A=c(1, 1, 0, 0, 0, 0),
                B=c(0, 0, 1, 1, 0, 0),
                C=c(0, 0, 0, 0, 1, 0),
                D=c(0, 0, 0, 0, 0, 1))

   # A tibble: 6 x 4
     A     B     C     D
   <dbl> <dbl> <dbl> <dbl>
1    1.    0.    0.    0.
2    1.    0.    0.    0.
3    0.    1.    0.    0.
4    0.    1.    0.    0.
5    0.    0.    1.    0.
6    0.    0.    0.    1.

I have tried the following:

a$dat <- 1
spread(a, type, dat)

However, it does not work as there are multiple instances of some of the categories. Any help would be appreciated. Thank you!


Solution

  • This is likely a duplicate -- what you are doing is usually referred to as "one hot encoding". One way is to leverage model.matrix:

    library(tidyverse)
    
    a %>% 
      model.matrix(~ . - 1, data = .) %>%
      as_data_frame()
    
    # A tibble: 6 x 4
      typeA typeB typeC typeD
      <dbl> <dbl> <dbl> <dbl>
    1     1     0     0     0
    2     1     0     0     0
    3     0     1     0     0
    4     0     1     0     0
    5     0     0     1     0
    6     0     0     0     1