I have a character vector:
s <- "0 / 10 %(% 1 / 11 %-% 2 / 12 %)% 3 / 13"
The goal is to split it on both /
and %*%
into (x,y) points and z symbols:
data.frame(x = c(0,1,2,3), y = c(10,11,12,13), z = c("(", "-", ")", NA),
stringsAsFactors = FALSE)
x y z
1 0 10 (
2 1 11 -
3 2 12 )
4 3 13 <NA>
Notes:
/
separates points: I want to split x / y
into the x
-part and y
-part.%*%
should go into a column z
of symbols, but without the %
's;I tried various versions of strsplit
with no success:
trimws(unlist(strsplit(s, "[/(%*%)]")))
[1] "0" "0" "" "" "1" "1" "-" "2" "2" "" "" "3" "3"
Issues:
-
does not get caught by (%*%)
, why?split
s into the z
columnThis solves your problem:
str <- "0 / 10 %(% 1 / 11 %-% 2 / 12 %)% 3 / 13"
str_sub <- gsub("[%/]","",str) #sub all % and / with ""
str_split <- strsplit(str_sub,"\\s+")[[1]] #split by whitespace
str_corr <- c(str_split,rep(NA,3-length(str_split) %% 3)) #correct length, fill the end with NAs
df <- as.data.frame(matrix(str_corr,ncol=3,byrow=TRUE)) #convert to data.frame via matrix
colnames(df) <- c("x","y","z") #set colnames
Created on 2019-04-09 by the reprex package (v0.2.1)
To your first Issue:
%*%
does not capture the -
because you ask the regex to repeat %
0 or more times (with the *) but are not asking for a -
.