I'm trying to convert a bunch of transactions to a wide matrix to run some regression model with.
Trans_id item_id
123 ABC
123 DEF
123 XYZ
345 ABC
... ...
I'd like to convert to something like this:
Trans_id item_ABC item_DEF item_XYZ
123 1 1 1
345 1 0 0
I'm trying to do this using the dummyVars function in caret but can't get it to do what I need.
dv1 <- dummyVars(Trans_id ~ item_id , data = res1)
df2 <- predict(dv1, res1)
just gets me a list of item_id with no dummy matrix.
item_id
ABC
DEF
XYZ
ABC
...
Any suggestions?
If we are using data.table
, then the dcast
can be used
library(data.table)
dcast(setDT(data), Trans_id ~ paste0("item_", item_id), length)
# Trans_id item_ABC item_DEF item_XYZ
#1: 123 1 1 1
#2: 345 1 0 0
Or a more general approach would be
dcast(setDT(data), Trans_id ~ paste0("item_", item_id), function(x) as.integer(length(x)>0))
data <- structure(list(Trans_id = c(123L, 123L, 123L, 345L), item_id = structure(c(1L,
2L, 3L, 1L), .Label = c("ABC", "DEF", "XYZ"), class = "factor")),
.Names = c("Trans_id",
"item_id"), class = "data.frame", row.names = c(NA, -4L))