Search code examples
regexregular-language

Selecting Inner nested text only


I'm using REGEX to select custom tags, however some of these tags have inner tags of the same name. I want to only select the inner tags, so i can process them first.

My REGEX is getting mixed up. I think this might require a recursive select, but I'm not sure how.

\[STORE.*?\]((.*?|\n)*)\[\/STORE\]

Text:

 [STORE SMC, DODO]blah blah blah blah blah

   [STORE SMC]blah[/STORE]

   [STORE DODO]Blah[/STORE].

 [/STORE]

 Some text here I do not want selected.

 [STORE SMC]blah[/STORE]

Select the tags in another run or the same run?


Solution

  • You can use this regex that uses a negative lookaahead to assert that we don't get another [STORE ...] in-between before the ending [/STORE]:

    \[STORE [^\]]*\](?:(?!\[STORE [^\]]*\])[\s\S])*?\[\/STORE\]
    

    RegEx Demo

    This will match the inner most STORE tags or parent level independent STORE tags.

    I've provided you a Javascript syntax. If you're using Java (Salesforce apax) then you can use:

    String = regex = "\\[STORE [^]]*\\](?:(?!\\[STORE [^]]*\\]).)*?\\[/STORE\\]";
    final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);