Search code examples
rplyrreshapereshape2

Reshape multiple categorical variables to binary response variables


I am trying to convert the following format:

mydata <- data.frame(movie = c("Titanic", "Departed"), 
                     actor1 = c("Leo", "Jack"), 
                     actor2 = c("Kate", "Leo"))

     movie actor1 actor2
1  Titanic    Leo   Kate
2 Departed   Jack    Leo

to binary response variables:

     movie Leo Kate Jack
1  Titanic   1    1    0
2 Departed   1    0    1

I tried the solution described in Convert row data to binary columns but I could get it to work for two variables, not three.

I would really appreciate if there is a clean way to do this.


Solution

  • How much spice is too much? Here is a solution via tidyr:

    library(dplyr)
    library(tidyr)
    
    mydata %>%
      gather(actor,name,starts_with("actor")) %>%
      mutate(present = 1) %>%
      select(-actor) %>%
      spread(name,present,fill = 0)
    
           movie Jack Kate Leo
     1 Departed    1    0   1
     2  Titanic    0    1   1