Assuming that I have a table of strings:
df <-tibble::tribble(
~ alternatives,
" 23.32 | x232 code | This is a description| 43.11 | a341 code | some other description | optimised | v333 code | still another description" )
I would like to split the string in the locations preceding numeric values: eg. before 23.32, before 43.11, and before the word "optimized".
It is expected that I achieve in each cell the vector:
c(23.32 | x232 code | This is a description|, 43.11 | a341 code | some other description |, optimised | v333 code | still another description)
What should be the regex pattern to achieve the split before specific patterns? The number of pipe characters between the patterns concerned may differ, I cannot use them reliably. I am vaguely aware of look-ahead etc. This code will not return what I expect but I believe I am looking for a similar solution (this will not do what I want):
df2 <-
df %>%
mutate(alternatives =
str_split(alternatives,
pattern = "(?<=[a-zA-Z])\\s*(?=[0-9])"))
enter code here
What would be the solution?
You may try splitting on the following regex pattern:
(?<=\S)\s+(?=(?:\d+\.\d+|optimised)\b)
Updated script:
df2 <- df %>%
mutate(alternatives =
str_split(alternatives,
pattern = "(?<=\\S)\\s+(?=(?:\\d+\\.\\d+|optimised)\\b)"))