Search code examples
rdataframedata-cleaningsurvey

How to convert a daatset where some subjects chose multiple answers into a dummy variables format?


I have this example dataset

df <- data.frame(subjects = 1:12,
                 Why_are_you_not_happy = 
                   c(1,2,"1,2,5",5,1,2,"3,4",3,2,"1,5",3,4),
                 why_are_you_sad = 
                   c("1,2,3",1,2,3,"4,5,3",2,1,4,3,1,1,1) )

And would like to convert it into a dummy variables format (based on the 5 answers of each question). Can someone guide me through an effective way ? thanks.


Solution

  • You can separate_rows for multiple choices, convert to dummy and summarise by subjects (to get one row per subjects, with all their choices).

    library(fastDummies)
    library(tidyr)
    library(dplyr)
    
    df %>% 
      separate_rows(Why_are_you_not_happy, why_are_you_sad) %>% 
      dummy_cols(c("Why_are_you_not_happy", "why_are_you_sad"),
                 remove_selected_columns = TRUE) %>% 
      group_by(subjects) %>% 
      summarise(across(everything(), max))
    

    output

    # A tibble: 12 × 11
       subjects Why_are_you…¹ Why_a…² Why_a…³ Why_a…⁴ Why_a…⁵ why_a…⁶ why_a…⁷ why_a…⁸ why_a…⁹ why_a…˟
          <int>         <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>
     1        1             1       0       0       0       0       1       1       1       0       0
     2        2             0       1       0       0       0       1       0       0       0       0
     3        3             1       1       0       0       1       0       1       0       0       0
     4        4             0       0       0       0       1       0       0       1       0       0
     5        5             1       0       0       0       0       0       0       1       1       1
     6        6             0       1       0       0       0       0       1       0       0       0
     7        7             0       0       1       1       0       1       0       0       0       0
     8        8             0       0       1       0       0       0       0       0       1       0
     9        9             0       1       0       0       0       0       0       1       0       0
    10       10             1       0       0       0       1       1       0       0       0       0
    11       11             0       0       1       0       0       1       0       0       0       0
    12       12             0       0       0       1       0       1       0       0       0       0