Search code examples
regexrstring-parsingstrsplit

Removing string parts between substrings when substrings occur multiple times in R


In a string

string="aaaaaaaaaSTARTbbbbbbbbbbSTOPccccccccSTARTddddddddddSTOPeeeeeee"

I would like to remove all parts that occur between START and STOP, yielding

"aaaaaaaaacccccccceeeeeee"

if I try with gsub("START(.*)STOP","",string) this gives me "aaaaaaaaaeeeeeee" though.

What would be the correct way to do this, allowing for multiple occurrences of START and STOP?


Solution

  • Add a ? in there too.

    gsub("START.*?STOP", "", string)
    # [1] "aaaaaaaaacccccccceeeeeee"