Search code examples
rstringsubstringstringrstringi

How to replace matches in a string and index each match


A particular string can contain multiple instances of a pattern that I'm trying to match. For example, if my pattern is <N(.+?)N> and my string is "My name is <N Timon N> and his name is <N Pumba N>", then there are two matches. I want to replace each match with a replacement that includes an index for which match is being replaced.

So in my string "My name is <N Timon N> and his name is <N Pumba N>", I want to change the string to read "My name is [Name #1] and his name is [Name #2]".

How do I accomplish this, preferably with a single function? And preferably using functions from stringr or stringi?


Solution

  • You can do this with gregexpr and regmatches in Base R:

    my_string = "My name is <N Timon N> and his name is <N Pumba N>"
    
    # Get the positions of the matches in the string
    m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)
    
    # Index each match and replace text using the indices
    match_indices = 1:length(unlist(m))
    
    regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))
    

    Result:

    > my_string
    # [1] "My name is [Name #1] and his name is [Name #2]"
    

    Note:

    This solution treats the same match as a different "Name" if it appears more than once. For example the following:

    my_string = "My name is <N Timon N> and his name is <N Pumba N>, <N Timon N> again"
    
    
    m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)
    
    match_indices = 1:length(unlist(m))
    
    regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))
    

    outputs:

    > my_string
    [1] "My name is [Name #1] and his name is [Name #2], [Name #3] again"