Search code examples
regexnon-greedy

Regex get all matches including smaller submatches


I have following input string

Testing <B><I>bold italic</I></B> text. 

and following regex :

<([A-Z][A-Z0-9]*)\b[^>]*>.*</\1>

This regex only gives following larger match

<B><I>bold italic</I></B>

How to use regex to get the smaller match ?

<I>bold italic</I>

I tried using non-greedy operators, but it didn't worked either.

And Is it possible to get both as match groups using like java or c# match groups or match collections ?


Solution

  • Try the below regex which uses positive lookbehind,

    (?<=>)<([A-Z][A-Z0-9]*)\b[^>]*>.*<\/\1>
    

    DEMO

    It looks for the tag which starts just after to the > symbol.

    Explanation:

    • (?<=>) Positive lookbehind is used here, which sets the matching marker just after tp the > symbol.
    • < Literal < symbol.
    • ([A-Z][A-Z0-9]*\b[^>]*>) Captures upto the next > symbol.
    • .* Matches nay character except \n zero or more times.
    • <\/\1> Matches the lietral </+first captured group+>