Search code examples
rregexstringstrsplit

Split string on two subpatterns into data.frame


I have a character vector:

s <- "0 / 10 %(% 1 / 11 %-% 2 / 12 %)% 3 / 13"

The goal is to split it on both / and %*% into (x,y) points and z symbols:

data.frame(x = c(0,1,2,3), y = c(10,11,12,13), z = c("(", "-", ")", NA),
           stringsAsFactors = FALSE)
  x  y    z
1 0 10    (
2 1 11    -
3 2 12    )
4 3 13 <NA>

Notes:

  • The / separates points: I want to split x / y into the x-part and y-part.
  • The second split %*% should go into a column z of symbols, but without the %'s;

I tried various versions of strsplit with no success:

trimws(unlist(strsplit(s, "[/(%*%)]")))
[1] "0" "0" ""  ""  "1" "1" "-" "2" "2" ""  ""  "3" "3"

Issues:

  • the - does not get caught by (%*%), why?
  • I have empty string parts in it, why?
  • I have no idea how to store the splits into the z column

Solution

  • This solves your problem:

    
    str <- "0 / 10 %(% 1 / 11 %-% 2 / 12 %)% 3 / 13"
    
    str_sub <- gsub("[%/]","",str) #sub all % and / with ""
    str_split <- strsplit(str_sub,"\\s+")[[1]] #split by whitespace
    str_corr <- c(str_split,rep(NA,3-length(str_split) %% 3)) #correct length, fill the end with NAs
    
    df <- as.data.frame(matrix(str_corr,ncol=3,byrow=TRUE)) #convert to data.frame via matrix
    colnames(df) <- c("x","y","z") #set colnames
    

    Created on 2019-04-09 by the reprex package (v0.2.1)

    To your first Issue:

    1. %*%does not capture the - because you ask the regex to repeat % 0 or more times (with the *) but are not asking for a -.