I have data that looks like the following:
library(dplyr)
library(tidyr)
a <- data_frame(type=c("A", "A", "B", "B", "C", "D"))
print(a)
# A tibble: 6 x 1
type
<chr>
1 A
2 A
3 B
4 B
5 C
6 D
Where type
contains categorical information. I am trying to convert each category in type
into its own column coded as 1 if a type
is present and 0 if not; thus, the final result would look like:
b <- data_frame(A=c(1, 1, 0, 0, 0, 0),
B=c(0, 0, 1, 1, 0, 0),
C=c(0, 0, 0, 0, 1, 0),
D=c(0, 0, 0, 0, 0, 1))
# A tibble: 6 x 4
A B C D
<dbl> <dbl> <dbl> <dbl>
1 1. 0. 0. 0.
2 1. 0. 0. 0.
3 0. 1. 0. 0.
4 0. 1. 0. 0.
5 0. 0. 1. 0.
6 0. 0. 0. 1.
I have tried the following:
a$dat <- 1
spread(a, type, dat)
However, it does not work as there are multiple instances of some of the categories. Any help would be appreciated. Thank you!
This is likely a duplicate -- what you are doing is usually referred to as "one hot encoding". One way is to leverage model.matrix
:
library(tidyverse)
a %>%
model.matrix(~ . - 1, data = .) %>%
as_data_frame()
# A tibble: 6 x 4
typeA typeB typeC typeD
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 0
2 1 0 0 0
3 0 1 0 0
4 0 1 0 0
5 0 0 1 0
6 0 0 0 1