Search code examples
rstringdataframetidyrdelimiter

Multipler delimiters in a column


Column of results with multiple answers separated by ',' or '/'. Need to count the instances of each response.

what the data looks like

What I want to end up with is

what the end should look like

I'm at a loss how to split the Answers column in the first table. I'm terrible with string splits.

I've tried using both strsplit and str_split as part of the data frame and turning the column into a list and trying them that way, but it was messy and keep giving me various error messages. I'm close with

df %>% separate_longer_delim(Answers, delim = ',/')

But I can't get the delim part to work. I can use either the comma or the slash but not both together.


Solution

  • As described in the document of separate_longer_delim:

    delim: By default, it is interpreted as a fixed string; use stringr::regex() and friends to split in other ways.

    library(tidyr)
    library(dplyr)
    
    df %>%
      separate_longer_delim(Answers, stringr::regex("[,/]\\s*")) %>%
      count(Answers, sort = TRUE)
    
    #          Answers n
    # 1           cars 2
    # 2           dirt 2
    # 3           toys 2
    # 4 all the things 1
    # 5          dolls 1
    # 6         trucks 1
    
    Data
    df <- data.frame(id = 1:4, Answers = c("toys, dirt", "cars, dolls", "cars/toys/dirt", "all the things, trucks"))