Search code examples
rdata-cleaningcategorical

Convert categorical variable into binary columns in R


I made the stupid mistake of enabling people to select multiple categories in a survey question.

Now the data column for this question looks something along the lines of this.

respondent answer_openq
1 a
2 a,c
3 b
4 a,d

using the following line in r,

datanum <- data %>% mutate(dummy=1) %>%
spread(key=answer_openq,value=dummy, fill=0)

I get the following: data how it looks like now

However, I want the dataset to transform into this:

respondent a b c d
1 1 0 0 0
2 1 0 1 0
3 0 1 0 0
4 1 0 0 1

Any help is appreciated (my thesis depends on it). Thanks :)


Solution

  • Try this:

    library(dplyr)
    library(tidyr)
    df %>%
      separate_rows(answer_openq, sep = ',') %>%
      pivot_wider(names_from = answer_openq, values_from = answer_openq, 
                  values_fn = function(x) 1, values_fill = 0)
    # A tibble: 4 × 5
      respondent     a     c     b     d
           <int> <dbl> <dbl> <dbl> <dbl>
    1          1     1     0     0     0
    2          2     1     1     0     0
    3          3     0     0     1     0
    4          4     1     0     0     1