Search code examples
regexsublimetextnon-greedy

Regex to match multiline string start with x, ends with y and contains z but not x in the middle


Better explain with an example.

This is text:

<li>hello 
THE WORDS
</li>

<li> cruel </li>

<li> world THE WORDS </li>

I want to find strings start with <li> and ends with </li> and contains THE WORDS. I am expecting to only match with <li> hello THE WORDS </li> and <li> world THE WORDS </li>.

What I tried: (?s)<li>.*?(THE WORDS).*?</li>

With this, second match is <li> cruel </li> <li> world THE WORDS </li>.

I am using Sublime Text.


Solution

  • EDIT: For the new requirements, use this regex:

    (?s)<li>(?:(?!</li>).)*?THE WORDS.*?</li>
    

    Explanation

    • (?s) activates DOTALL mode, allowing the dot to match across lines
    • <li> matches literal chars
    • (?:(?!</li>).) asserts that what follows is not </li and matches one character
    • The *? quantifier makes us lazily match up to....
    • THE WORDS
      • The *? quantifier makes us lazily match up to....
    • literal </li>

    Original Answer (different requirements):

    Use this simple regex (tested in Sublime):

    x+[^xy]*z[^xy]*y+
    

    See matches in the regex demo.

    Explanation

    • x+ matches one or more x chars
    • [^xy]* matches any chars that are neither an x nor a y
    • z matches the z we want (ensuring there is at least one)
    • [^xy]* matches any chars that are neither an x nor a y
    • y+ matches one or more y chars