Search code examples
rdataframestrsplit

strsplit split on either or depending on


Once again I'm struggling with . I'm transforming some strings to data frames, but there's a forward slash, / and some white space in my string that keep bugging me. I could work around it, but I eager to learn if I can use some fancy either or in . My working example below should illustrate the issue

The function I'm currrently using

str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "\\s+")[[x]])) }

one type of string I got,

string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
str_to_df(string1)
#>      [,1]    [,2]  
#> [1,] "One"   "58/2"
#> [2,] "Two"   "22/3"
#> [3,] "Three" "15/5"

another type I got in the same spot,

string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string2)
#>      [,1]    [,2] [,3] [,4]
#> [1,] "One"   "58" "/"  "2" 
#> [2,] "Two"   "22" "/"  "3" 
#> [3,] "Three" "15" "/"  "5" 

They obviously create different outputs, and I can't figure out how to code a solution that work for both. Below is my desired outcome. Thank you in advance!

desired_outcome <- structure(c("One", "Two", "Three", "58", "22",
                               "15", "2", "3", "5"), .Dim = c(3L, 3L))
desired_outcome
#>      [,1]    [,2] [,3]
#> [1,] "One"   "58" "2" 
#> [2,] "Two"   "22" "3" 
#> [3,] "Three" "15" "5"

Solution

  • We can create a function to split at one or more space or tab or forward slash

    f1 <- function(str1) do.call(rbind, strsplit(str1, "[/\t ]+"))
    f1(string1)
    #    [,1]    [,2] [,3]
    #[1,] "One"   "58" "2" 
    #[2,] "Two"   "22" "3" 
    #[3,] "Three" "15" "5" 
    
    f1(string2)
    #     [,1]    [,2] [,3]
    #[1,] "One"   "58" "2" 
    #[2,] "Two"   "22" "3" 
    #[3,] "Three" "15" "5" 
    

    Or we can do with read.csv after replacing the spaces with a common delimiter

    read.csv(text=gsub("[\t/ ]+", ",", string1), header = FALSE)
    #     V1 V2 V3
    #1   One 58  2
    #2   Two 22  3
    #3 Three 15  5