Search code examples
rregexstringr

Replace substring based on the position in the string via regex


Let's assume I have a certain pattern in my string which occurs a known number of times (n) and we do not want to make any assumptions about the rest of the string (in particular the strings which are between those patterns).

Furthermore, I have a vector of length n (sf, say) and I want to amend each occurrence of the pattern with the corresponding element. Thus, for each match I would like to know how often the match has hit already?

I could think of the following solution:

library(stringr)
sf <- letters[4:1]
ss <- "fdskjhf xx sd ss xx wwwe xx ss  xx sdsd"
#              ^^ 1st   ^^ 2nd  ^^ 3rd ^^ 4th
# add:         _sf[1]   _sf[2]  _sf[3] _sf[4]
# that is:     xx_d     xx_c    xx_b   xx_a


## add _sf[i] to the ith occurence of "xx" in ss
goal <- "fdskjhf xx_d sd ss xx_c wwwe xx_b ss  xx_a sdsd"

my_replacer_factory <- function(sf, start = 0) {
  cnt <- start
  function(el) {
    cnt <<- cnt + 1
    paste0(el, "_", rev(sf)[cnt])
  }
}

my_replacer <- my_replacer_factory(sf)
(res <- str_replace_all(ss, fixed("xx"), my_replacer))
# [1] "fdskjhf xx_d sd ss xx_c wwwe xx_b ss  xx_a sdsd"

all.equal(res, goal)
# [1] TRUE

This works apparently, but it feels error prone b/c I rely on the fact that str_replace_all starts from the right to replace. What if in a future implementation this behaviour changes or gets parallelized?

Any idea of how to achieve this differntly? Ideally with stringr functions?


Similar idea:

my_replacer_factory <- function(sf) {
  suffixes <- rev(sf)
  function(el) {
    on.exit(suffixes <<- suffixes[-1L], add = TRUE)
    paste0(el, "_", suffixes[1L])
  }
}

Solution

  • A way would be to use regmatches<-.

    sf <- letters[4:1]
    ss <- "fdskjhf xx sd ss xx wwwe xx ss  xx sdsd"
    
    regmatches(ss, gregexpr("xx", ss)) <- list(paste0("xx_", sf))
    ss
    #[1] "fdskjhf xx_d sd ss xx_c wwwe xx_b ss  xx_a sdsd"
    
    #Alternative with look behind
    regmatches(ss, gregexpr("(?<=xx)", ss, perl=TRUE)) <- list(paste0("_", sf))