I am trying to get the dummy variables for the following table:
df:
Value1 var1 var2 var3 var4
9.330154398 HomeATL AwayHOU HomeEast AwayWest
32.43881489 AwaySDN HomeATL HomeWest AwayWest
54.77178387 AwayLAN HomeATL AwayEast HomeSame
54.77178387 AwayLAN HomeATL AwayWest HomeEast
The column var1
and var2
share the same levels. On the other hand, the column var3
and var4
whare their levels as well. Therefore, I need that during the creation of the dummy variables, the new columns created should not have repeated levels. I mean, in the example of var3 and var4, for the 1st and 3rd row, both have AwayWest
, so I only need to fill 1 column named AwayWest
with a number 1 on each row.
My desired output is:
Value1 HomeEast HomeWest AwayEast AwayWest HomeSame HomeATL AwayHOU AwaySDN AwayLAN
9.330154398 1 0 0 1 0 1 1 0 0
32.43881489 0 1 0 1 0 1 0 1 0
54.77178387 0 0 1 0 1 1 0 0 1
54.77178387 1 0 0 1 0 1 0 0 1
I tried creating a new column of 1 (col1
) for each column to be transformed:
spread(df,var1, col1) %>%
spread(var2, col1)%>%
spread(var3, col1)%>%
spread(var1, col1)
However it is not working.
Thanks
A base R
option would be table
tbl <- +(table(c(col(df1[-1])), unlist(df1[-1]) ) > 0)
cbind(df1[1], as.data.frame.matrix(tbl))
Or using tidyverse
library(tidyverse)
rownames_to_column(df1, 'rn') %>%
gather(key, val, var1:var4) %>%
count(rn, val) %>%
spread(val, n, fill = 0) %>%
select(-rn) %>%
bind_cols(df1[1], .)
df1 <- structure(list(Value1 = c(9.330154398, 32.43881489, 54.77178387,
54.77178387), var1 = c("HomeATL", "AwaySDN", "AwayLAN", "AwayLAN"
), var2 = c("AwayHOU", "HomeATL", "HomeATL", "HomeATL"), var3 = c("HomeEast",
"HomeWest", "AwayEast", "AwayWest"), var4 = c("AwayWest", "AwayWest",
"HomeSame", "HomeEast")), class = "data.frame", row.names = c(NA,
-4L))