Search code examples
rdataframedummy-variable

Change structure of DF to dummy


I am looking for a way of changing structure of DF so I can use beta regression after. The df looks like this at the moment:

rating   playerID
   0.6         a1
    NA         b2
   0.9         a4
    NA         b5
     0         a3
    NA         b2

I need to make it look this way:

rating   a1   a2   a3   a4   a5   b1   b2   b3   b4   b5
   0.6    1    0    0    0    0    0   -1    0    0    0
   0.9    0    0    0    1    0    0    0    0    0   -1
     0    0    0    1    0    0    0   -1    0    0    0

It not necessary to have the -1 (1 works as well) by the "bX" variables. The idea behind is to take pairs (player "aX" and "bX") and encode them as dummy variables with the rating of player "aX" at the same line.

Thank you for any ideas and inputs.


Solution

  • Here's a base R solution using table, assuming the factor levels a1 to b5 are already present in playerID:

    table(subset(DF, grepl("a", playerID))) -
     table(subset(within(DF, rating <- dplyr::lag(rating)), grepl("b", playerID)))
    
    #>       playerID
    #> rating a1 a2 a3 a4 a5 b1 b2 b3 b4 b5
    #>    0    0  0  1  0  0  0 -1  0  0  0
    #>    0.6  1  0  0  0  0  0 -1  0  0  0
    #>    0.9  0  0  0  1  0  0  0  0  0 -1