Search code examples
rseparatorsplitstackshape

Separating cells with several delimiters (splitstackshape)


I am working with a database that should be separated by several delimiters. The most common are semicolons and a point followed by a slash: './'.

How do I complete the code in order to apply both delimiters?

library(tidyverse)
library(splitstackshape)

values <- c("cat; dog; mouse", "cat ./ dog ./ mouse")
data <- data.frame(cbind(values))

separated <- cSplit(data.frame(data), "values", sep = ";", drop = TRUE)

I tried a vector solution but without much success.


Solution

  • I'm not exactly sure what your final output structure should be, but one approach could be to start with tidy::separate which would put all of your animals in a separate column:

    df <- tidyr::separate(data, col = values, 
                    into = c("Animal1", "Animal2", "Animal3"), 
                    sep = c(";|./"))
    
    #. Animal1 Animal2 Animal3
    #1     cat     dog   mouse
    #2     cat     dog   mouse
    

    Without a pre-defined number of elements in each string, you could also try:

    # Add in a third value to data with only 2 animals
    values <- c("cat; dog; mouse", "cat ./ dog ./ mouse", "frog; squirrel")
    data <- data.frame(cbind(values))
    
    
    data_clean <- gsub(";|./", ";", data$values)
    separated <- splitstackshape::cSplit(data.frame(values = data_clean), 
                                         "values", sep = ";", drop = TRUE)
    
    #    values_1 values_2 values_3
    # 1:      cat      dog    mouse
    # 2:      cat      dog    mouse
    # 3:     frog squirrel     <NA>