Search code examples
regexrstringstringi

Replace parts of string using package stringi (regex)


I have some string

string <- "abbccc"

I want to replace the chains of the same letter to just one letter and number of occurance of this letter. So I want to have something like this: "ab2c3"

I use stringi package to do this, but it doesn't work exactly like I want. Let's say I already have vector with parts for replacement:

vector <- c("b2", "c3")
stri_replace_all_regex(string, "([a-z])\\1{1,8}", vector)

The output:

[1] "ab2b2" "ac3c3"

The output I want: [1] "ab2c3"

I also tried this way

stri_replace_all_regex(string, "([a-z])\\1{1,8}", vector, vectorize_all=FALSE)

but i get error

Error in stri_replace_all_regex(string, "([a-z])\\1{1,8}", vector, vectorize_all = FALSE) : 
  vector length not consistent with other arguments

Solution

  • Not regex but astrsplit and rle with some paste magic:

    string <- c("abbccc", "bbaccc", "uffff", "aaabccccddd")
    
    sapply(lapply(strsplit(string, ""), rle), function(x) {
        paste(x[[2]], ifelse(x[[1]] == 1, "", x[[1]]), sep="", collapse="")
    })
    
    ## [1] "ab2c3"   "b2ac3"   "uf4"     "a3bc4d3"