I have a relational data frame with 2 columns, customers and purchases. I would like a data frame with a row for each distinct customer and a column for each product with indicators variable showing whether or not that customer has purchased that product.
Example:
df <- data.frame(customer=c("A", "A", "B", "B"), purchase = c("Milk", "Eggs", "Juice", "Milk"))
customer purchase
1 A Milk
2 A Eggs
3 B Juice
4 B Milk
I want:
customer Milk Eggs Juice
1 A 1 1 0
2 B 1 0 1
We can use
library(reshape2)
dcast(df, customer~purchase, length, value.var='purchase')
# customer Eggs Juice Milk
#1 A 1 0 1
#2 B 0 1 1