Search code examples
regexnotepad++

Regex for `x...(a and b)...y` within a limited length?


Is it possible to specify a regex that matches x...(a and b)...y within a limited length of n?

To be more precise:

  1. The length of chars between the matched x and y must be at most n.
  2. Both an a and a b (regardless of order) must exist between the matched x and y.
  3. x, a, b, and y here could stand for a multi-char string snippet.

Test cases (assume n = 10):

Match:
...x..a...b..y...
...x..b...a..y...
...x..a...b..y...a...
...x..a...b..y...b...
...x..a...b..y...y...
...x..a.x.b..y...
...x..a.y.b..y...
...xaabbaabbay...
...x..a...b..y... ...x..a...b..y... (2 matches)
...xaby...xaby... (2 matches)

Don't match:
...x..a...b......
...x......b..y...
...x..a......y...
...x..a...b......y...
...x......b..y...a...
...x..a......y...b...
...x..a.y.b......
...x..a.y.b......y...
...x..a.y.b.x....y...
.a.x......b..y...
.b.x..a......y...

P.S: I know that this can be done by simply match /x.{0,n}y/ and then check whether a and b both exist in the matched string in many programming languages. However, this question explicitly requests for a single regex approach, so that it can be used as a query in some applications, such as Google Doc and Notepad++.


Solution

  • I believe you can use

    x(?:(?(1)(?!)|(a))|(?(2)(?!)|(b))|.){0,10}?y(?(1)|(?!))(?(2)|(?!))
    

    See the regex demo. Details:

    • x - left-hand delimiter
    • (?:(?(1)(?!)|(a))|(?(2)(?!)|(b))|.){0,10}? - zero to ten occurrences (but as few as possible) of
      • (?(1)(?!)|(a)) - if Group 1 is null (if Group 1 was not matched before) match a, else, fail to trigger backtracking right away
      • | - or
      • (?(2)(?!)|(b)) - if Group 2 is null (if Group 2 was not matched before) match b, else, fail to trigger backtracking right away
      • |. - or any one char (other than line break char by default)
    • y - right-hand delimiter
    • (?(1)|(?!)) - conditional construct: if Group 1 participated in the match (if Group 1 value is not null), proceed, else, fail the match
    • (?(2)|(?!)) - conditional construct: if Group 2 participated in the match (if Group 2 value is not null), proceed, else, fail the match.

    The last two conditionals make sure there is a and b in the matched text.

    NOTE: If a and b can be multicharacter strings, n must be reduced to the length that is n-(a.length-b.length+2). So, if a is ac and b is bmc, you need to replace 10 with 10-(2-3+2) => 7. This also means you should check your expected limit length restriction beforehand, it should allow the length of both a and b combined.

    NOTE 2: Add (?s) at the beginning of the pattern to match line break chars with . as well. See How do I match any character across multiple lines in a regular expression? for more options.