Let's say I have some data as follows:
ID FRUIT
001 apple
002 grape
001 banana
002 apple
003 apple
001 apple
I would like to make columns out of this, like dummy variables. Except the dummies are counts of the variable in the FRUIT
column. So, if ID 001
has apple
appear 2 two times in the FRUIT
column, the new column apple
or FRUIT_apple
is 2.
Expected output:
ID FRUIT_apple FRUIT_grape FRUIT_banana
001 2 0 1
002 1 1 0
003 1 0 0
Not attached to these column names, whatever is easier.
using reshape2
but you could pretty much use any package that lets you reformat from long to wide
library(reshape2)
df = dcast(fruitData,ID~FRUIT,length)
> df
ID apple banana grape
1 1 2 1 0
2 2 1 0 1
3 3 1 0 0