Is it possible to specify a regex that matches x...(a and b)...y
within a limited length of n
?
To be more precise:
x
and y
must be at most n
.a
and a b
(regardless of order) must exist between the matched x
and y
.x
, a
, b
, and y
here could stand for a multi-char string snippet.Test cases (assume n
= 10):
Match:
...x..a...b..y...
...x..b...a..y...
...x..a...b..y...a...
...x..a...b..y...b...
...x..a...b..y...y...
...x..a.x.b..y...
...x..a.y.b..y...
...xaabbaabbay...
...x..a...b..y... ...x..a...b..y... (2 matches)
...xaby...xaby... (2 matches)
Don't match:
...x..a...b......
...x......b..y...
...x..a......y...
...x..a...b......y...
...x......b..y...a...
...x..a......y...b...
...x..a.y.b......
...x..a.y.b......y...
...x..a.y.b.x....y...
.a.x......b..y...
.b.x..a......y...
P.S: I know that this can be done by simply match /x.{0,n}y/
and then check whether a
and b
both exist in the matched string in many programming languages. However, this question explicitly requests for a single regex approach, so that it can be used as a query in some applications, such as Google Doc and Notepad++.
I believe you can use
x(?:(?(1)(?!)|(a))|(?(2)(?!)|(b))|.){0,10}?y(?(1)|(?!))(?(2)|(?!))
See the regex demo. Details:
x
- left-hand delimiter(?:(?(1)(?!)|(a))|(?(2)(?!)|(b))|.){0,10}?
- zero to ten occurrences (but as few as possible) of
(?(1)(?!)|(a))
- if Group 1 is null (if Group 1 was not matched before) match a
, else, fail to trigger backtracking right away|
- or(?(2)(?!)|(b))
- if Group 2 is null (if Group 2 was not matched before) match b
, else, fail to trigger backtracking right away|.
- or any one char (other than line break char by default)y
- right-hand delimiter(?(1)|(?!))
- conditional construct: if Group 1 participated in the match (if Group 1 value is not null), proceed, else, fail the match(?(2)|(?!))
- conditional construct: if Group 2 participated in the match (if Group 2 value is not null), proceed, else, fail the match.The last two conditionals make sure there is a
and b
in the matched text.
NOTE: If a
and b
can be multicharacter strings, n
must be reduced to the length that is n-(a.length-b.length+2)
. So, if a
is ac
and b
is bmc
, you need to replace 10
with 10-(2-3+2)
=> 7
. This also means you should check your expected limit length restriction beforehand, it should allow the length of both a
and b
combined.
NOTE 2: Add (?s)
at the beginning of the pattern to match line break chars with .
as well. See How do I match any character across multiple lines in a regular expression? for more options.