Let's assume I have a certain pattern in my string which occurs a known number of times (n
) and we do not want to make any assumptions about the rest of the string (in particular the strings which are between those patterns).
Furthermore, I have a vector of length n
(sf
, say) and I want to amend each occurrence of the pattern with the corresponding element. Thus, for each match I would like to know how often the match has hit already?
I could think of the following solution:
library(stringr)
sf <- letters[4:1]
ss <- "fdskjhf xx sd ss xx wwwe xx ss xx sdsd"
# ^^ 1st ^^ 2nd ^^ 3rd ^^ 4th
# add: _sf[1] _sf[2] _sf[3] _sf[4]
# that is: xx_d xx_c xx_b xx_a
## add _sf[i] to the ith occurence of "xx" in ss
goal <- "fdskjhf xx_d sd ss xx_c wwwe xx_b ss xx_a sdsd"
my_replacer_factory <- function(sf, start = 0) {
cnt <- start
function(el) {
cnt <<- cnt + 1
paste0(el, "_", rev(sf)[cnt])
}
}
my_replacer <- my_replacer_factory(sf)
(res <- str_replace_all(ss, fixed("xx"), my_replacer))
# [1] "fdskjhf xx_d sd ss xx_c wwwe xx_b ss xx_a sdsd"
all.equal(res, goal)
# [1] TRUE
This works apparently, but it feels error prone b/c I rely on the fact that str_replace_all
starts from the right to replace. What if in a future implementation this behaviour changes or gets parallelized?
Any idea of how to achieve this differntly? Ideally with stringr functions?
Similar idea:
my_replacer_factory <- function(sf) {
suffixes <- rev(sf)
function(el) {
on.exit(suffixes <<- suffixes[-1L], add = TRUE)
paste0(el, "_", suffixes[1L])
}
}
A way would be to use regmatches<-
.
sf <- letters[4:1]
ss <- "fdskjhf xx sd ss xx wwwe xx ss xx sdsd"
regmatches(ss, gregexpr("xx", ss)) <- list(paste0("xx_", sf))
ss
#[1] "fdskjhf xx_d sd ss xx_c wwwe xx_b ss xx_a sdsd"
#Alternative with look behind
regmatches(ss, gregexpr("(?<=xx)", ss, perl=TRUE)) <- list(paste0("_", sf))