Search code examples
rdataframerelationalindicator

Convert a relational data frame into indicator variables


I have a relational data frame with 2 columns, customers and purchases. I would like a data frame with a row for each distinct customer and a column for each product with indicators variable showing whether or not that customer has purchased that product.

Example:

df <- data.frame(customer=c("A", "A", "B", "B"), purchase = c("Milk", "Eggs", "Juice", "Milk"))
  customer purchase
1        A     Milk
2        A     Eggs
3        B    Juice
4        B     Milk

I want:

  customer Milk Eggs Juice
1        A    1    1     0
2        B    1    0     1

Solution

  • We can use

    library(reshape2)
    dcast(df, customer~purchase, length, value.var='purchase')
    #    customer Eggs Juice Milk
    #1        A    1     0    1
    #2        B    0     1    1