Search code examples
rdummy-variable

How to quickly create dummy variables from list in R


So I am new to using R and I am having trouble with a rather simple task. I have a df called "Data" as follows...

           Group       Score.Diff
Row 1   Kyle, Steve      15
Row 2   Matthew, Tony    12 
...     ...              ...            
Row n   Anthony, Zack    -10

I also have a vector called "Player.Names" of all unique names that occur at some point in Data$Group like so...

        Names
Row 1   Anthony
Row 2   Kyle
...     ...
Row n   Zack

What I am struggling to accomplish is to create new columns in "Data" that represent each unique name and contain a value of 1 if the name is in Data$Group and a value of 0 if it is not. The desired output is shown below...

           Group       Score.Diff  Anthony  Kyle  Steve ...  Zack
Row 1   Kyle, Steve      15           0      1     1    ...   0
Row 2   Matthew, Tony    12           0      0     0    ...   0
...     ...              ...         ...    ...   ...   ...  ...
Row n   Anthony, Zack    -10          1      0     0    ...   1

Solution

  • We can use grepl with patterns as the 'Names' column in 'df2' (looped with sapply) to return a logical vector for the 'Group' column, coerce to binary with as.integer and cbind with the first dataset ('df1').

    cbind(df1, sapply(df2$Names, function(x) as.integer(grepl(x, df1$Group))))
    #               Group Score.Diff Anthony Kyle Zack
    #Row 1   Kyle, Steve         15       0    1    0
    #Row 2 Matthew, Tony         12       0    0    0
    #Row n Anthony, Zack        -10       1    0    1
    

    ###data

    df1 <- structure(list(Group = c("Kyle, Steve", "Matthew, Tony",
     "Anthony, Zack"
    ), Score.Diff = c(15L, 12L, -10L)), .Names = c("Group", "Score.Diff"
    ), class = "data.frame", row.names = c("Row 1", "Row 2", "Row n"))
    
    df2 <- structure(list(Names = c("Anthony", "Kyle", "Zack")), 
       .Names = "Names", class = "data.frame", row.names = c("Row 1", "Row 2",  "Row n"))