Search code examples
rdplyrtidyversedummy-variable

Creating dummy variables as counts using tidyverse/dplyr


Let's say I have some data as follows:

ID    FRUIT
001   apple
002   grape
001  banana
002   apple
003   apple
001   apple

I would like to make columns out of this, like dummy variables. Except the dummies are counts of the variable in the FRUIT column. So, if ID 001 has apple appear 2 two times in the FRUIT column, the new column apple or FRUIT_apple is 2.

Expected output:

ID   FRUIT_apple  FRUIT_grape  FRUIT_banana
001            2            0             1
002            1            1             0
003            1            0             0

Not attached to these column names, whatever is easier.


Solution

  • using reshape2 but you could pretty much use any package that lets you reformat from long to wide

        library(reshape2)
        df = dcast(fruitData,ID~FRUIT,length)
       
        > df
        ID apple banana grape
      1  1     2      1     0
      2  2     1      0     1
      3  3     1      0     0