Search code examples
rr-caretdummy-variable

R dummyvars - dummy variables for a single column


I'm trying to convert a bunch of transactions to a wide matrix to run some regression model with.

Trans_id     item_id
  123         ABC
  123         DEF
  123         XYZ
  345         ABC
  ...         ...

I'd like to convert to something like this:

Trans_id     item_ABC    item_DEF   item_XYZ   
  123            1           1          1
  345            1           0          0 

I'm trying to do this using the dummyVars function in caret but can't get it to do what I need.

dv1 <- dummyVars(Trans_id ~ item_id , data = res1)
df2 <- predict(dv1, res1)

just gets me a list of item_id with no dummy matrix.

 item_id
   ABC
   DEF
   XYZ
   ABC
   ...

Any suggestions?


Solution

  • If we are using data.table, then the dcast can be used

    library(data.table)
    dcast(setDT(data), Trans_id ~ paste0("item_", item_id), length)
    #   Trans_id item_ABC item_DEF item_XYZ
    #1:      123        1        1        1
    #2:      345        1        0        0
    

    Or a more general approach would be

    dcast(setDT(data), Trans_id ~ paste0("item_", item_id), function(x) as.integer(length(x)>0))
    

    data

    data <- structure(list(Trans_id = c(123L, 123L, 123L, 345L), item_id = structure(c(1L, 
    2L, 3L, 1L), .Label = c("ABC", "DEF", "XYZ"), class = "factor")),
     .Names = c("Trans_id", 
    "item_id"), class = "data.frame", row.names = c(NA, -4L))