r multiple-columns tidyr survey strsplit

Conditionally Separate String into Columns

It is common in surveys to ask a question and then tell participants to "select all that apply". For example, "Which foods do you enjoy eating (Please select all that apply)?" a) Sushi, b) Pasta, c) Hamburger.

Assuming four (N=4) participants answered this question, the data could look like this.

food.df <- data.frame(id = c(1,2,3,4), food.choice = c("1,2", "", "1,2,3", "3"))

What I am trying to do is conditionally separate these into unique columns using a method that is flexible on the number of individuals and the number of food choice attributes (i.e. Sushi, Pasta, Hamburger, ....). The final data would look something like this.

food.final <- data.frame(id= c(1,2,3,4), sushi = c(1,0,1,0), pasta = c(1,0,1,0), hamburger = c(0,0,1,1))

The more advanced version of this would allow for conditional groupings. You can think of this as grouping by food groups, location, etc. Assuming we were grouping by "selected foods that have protein" this could be coded to reflect total choices. This could look something like this.

food.group <- data.frame(id = c(1,2,3,4), protein = c(1,0,2,1), non.protein = c(1,0,1,0))

I have tried to use tidyr::separate, strsplit, and other column splitting functions but cannot seem to get the desired outcome. Appreciate the help on this and hopefully, the answer helps other users of R who do survey work.

Solution

We may use fastDummies

library(fastDummies)
library(dplyr)
dummy_cols(food.df, 'food.choice', split = ",", 
    remove_selected_columns = TRUE) %>%
    setNames(c("id", "sushi", "pasta", "hamburger"))

-output

   id sushi pasta hamburger
1  1     1     1         0
2  2     0     0         0
3  3     1     1         1
4  4     0     0         1

If the renaming should be automatic, create a named vector and use str_replace

library(stringr)
nm1 <- setNames(c("sushi", "pasta", "hamburger"), 1:3)
 dummy_cols(food.df, 'food.choice', split = ",", 
    remove_selected_columns = TRUE) %>% 
   rename_with(~ str_replace_all(str_remove(.x, 'food.choice_'), nm1), -id)
  id sushi pasta hamburger
1  1     1     1         0
2  2     0     0         0
3  3     1     1         1
4  4     0     0         1

For the second case, we may use str_count

food.df %>%
   mutate(protein = str_count(food.choice, '[13]'), 
    non.protein = str_count(food.choice, '2'), .keep = 'unused')
  id protein non.protein
1  1       1           1
2  2       0           0
3  3       2           1
4  4       1           0