Search code examples
rgsubregexp-replace

Replace repeated characters by number of repetitions in string


I'm trying to extract the number of times a given character is repeated and use it in the string to replace it for. Here's an example :

before = c("w","www","answer","test","wwwxww")
after = c("w{1}","w{3}","answ{1}er","test","w{3}xw{2}")

Is there a simple way, combining gsub and regex for instance, to achieve this ?

before = c("w","www","answer","test")
after = gsub("w+",w"\\{n\\}",before)

result :

after = c("w{n},"w{n}","answ{n}er","test")

the idea is to replace n with the exact number of occurrences


Solution

  • A base way using gregexpr to find the w and regmatches to substitute the matches with the match length.

    x <- gregexpr("w+", before)
    regmatches(before, x) <- lapply(x, \(y) paste0("w{", attr(y, "match.length"), "}"))
    before
    #[1] "w{1}"      "w{3}"      "answ{1}er" "test"      "w{3}xw{2}"