Search code examples
regexregex-groupregex-lookaroundsregex-negationregex-replace

regex stop continuous match when reach specific symbol


I want to remove character other than letters and number between two symbol which are < and > with empty string. The string is <F=*A*B*C*>

 (?<=F=|\G(?!^))[A-Za-z1-9]*\K[^A-Za-z1-9]+

 //output:<F=ABC 

 (?:^<F=(?=.+>$)|\G(?!^))[A-Za-z1-9]*\K[^A-Za-z1-9]+
 
 //output:<F=ABC 

This regex pattern capture last closing tag too and removed it (<F=ABC). How to make it stop at specific symbol and avoid it from capture last closing tag.

When I add > in [^A-Za-z1-9], it can remove characters other than > symbol correctly.

(?<=F=|\G(?!^))[A-Za-z1-9]*\K[^A-Za-z1-9>]+

//output: <F=ABC>// desired result

what is correct way to define stop matching start from this symbol? Thank you.


Solution

  • You can use

    (?:\G(?!^)|<F=)[^<>]*?\K[^A-Za-z0-9<>]+(?=[^<>]*>)
    

    See the regex demo.

    Details:

    • (?:\G(?!^)|<F=) - either the end of the previous match or <F= text
    • [^<>]*? - any zero or more chars other than < and >, as few as possible
    • \K - match reset operator that discards the text matched so far from the overall match memory buffer
    • [^A-Za-z0-9<>]+ - one or more chars other than ASCII letters/digits and < and > chars
    • (?=[^<>]*>) - immiediately on the right, there must be zero or more chars other than < and > and then a > char.