Search code examples

How to separate multiple choice phrases (Google Forms) in different columns?

I saw that there are some topics on this issue (here and here), but in both cases the examples were with multiple comma-delimited choices. In this case it is a little different.

  1. In my survey, there were options to select multiple phrases (to make it difficult, some of them contain commas)
  2. There is an "Other reasons" option, in which respondents could write their own sentences.
  3. Each sentence starts with a capital letter (and there are no other capital letters in the middle of the sentence). Except the "other reasons" options, which can start with a lowercase letter depending on how the respondent wrote.

The list of predefined choices is registered as follows:

Q1.list <- c ("Phrase one without comma", "Phrase two also without comma", "Phrase three, with comma")

The database looks like this:

"Phrase one without comma, Phrase two also without comma"
"Phrase two also without comma, Phrase three, with comma"
"Phrase three, with comma, Phrase four, other reasons"
"Phrase one without comma, Phrase four, other reasons, Phrase five other reasons"

And I would like to transform the data set in this way:

Q1.1          Q1.2          Q1.3          Others
1             1             0             0
0             1             1             0
0             0             1             "Phrase four, other reasons"
1             0             0             "Phrase four, other reasons, Phrase five other reasons [and everything else that is not on the Q1.list]"

Could someone shed light on how to solve this problem?


  • You can use dplyr & co. and do as follows.

    data %>%
      transmute(Q1.1 = +(str_detect(Q1, Q1.list[1])),
                Q1.2 = +(str_detect(Q1, Q1.list[2])),
                Q1.3 = +(str_detect(Q1, Q1.list[3])),
                Others = str_remove_all(Q1, str_c(Q1.list, collapse = '|')),
                Others = if_else(str_sub(Others, 1, 2) == ', ',
                                 str_sub(Others, 3),
                Others = if_else(Others == '', '0', Others))
    #    Q1.1  Q1.2  Q1.3 Others                                               
    #   <int> <int> <int> <chr>                                                
    # 1     1     1     0 0                                                    
    # 2     0     1     1 0                                                    
    # 3     0     0     1 Phrase four, other reasons                           
    # 4     1     0     0 Phrase four, other reasons, Phrase five other reasons


    data <- structure(list(Q1 = c("Phrase one without comma, Phrase two also without comma", 
    "Phrase two also without comma, Phrase three, with comma", "Phrase three, with comma, Phrase four, other reasons", 
    "Phrase one without comma, Phrase four, other reasons, Phrase five other reasons"
    )), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
    Q1.list <- c("Phrase one without comma", "Phrase two also without comma", "Phrase three, with comma")