Admitting that these regex questions have been asked before, I'm still struggling in getting a working solution (even after consulting ChatGPT).
Taking the following example: text <- c("test1", "test2 | ", "test3 | test3 | test 3", "test4 | test4 | test 4 | test4")
I want to remove all text beginning from the n-th (in my case second) occurence of " | ".
So the output should be: output <- c("test1", "test2 | ", "test3 | test3", "test4 | test4")
I got it working for the case when there are up to two " | " texts with str_remove(text, "( \\| [^\\|]+$)")
, but this doesn't generalize for cases with more then two occurences of this matching pattern.
You can use
library(stringr)
n <- 2
str_replace(text, paste0("^(.*?(?: \\| .*?){", n-1, "}) \\| .*"), "\\1")
where
\|
is your delimiter.*?
matches any text (other than line break chars, add (?s)
at the start of the pattern to make it match across lines)str_replace
is required to keep the first group value after removing the match value.See the R demo online (and here is the resulting regex demo).