Given two strings (denoted by A and B) and a set N of strings, I need to write a regular expression to test whether a given input string W contains a substring S, where S is any substring that satisfies all of the following three conditions: 1. starts with A; 2. ends with B; 3. any element of N does not occur in the part between A and B (this part does not overlap with A and B).
For example, I chose "ab"
as A, "bc"
as B, ["a", "cb", "cd"]
as N. If "ec"
is the inner part, then "abecbc"
is the string that satisfies all of the three conditions: if W contains such a substring, the regex must return true
. My first variant is the following regex:
var T = /(?=ab.*bc)(?=(?!ab.*a.*bc))(?=(?!ab.*cb.*bc))(?=(?!ab.*cd.*bc))/;
I chose W = S = "abecbc"
. This regex works as expected:
T.test("abecbc");
// true
But I am interested in the following problem: how to write a functionally equivalent regex without using the positive lookahead (?=)
as the AND operator?
So my second variant is the following:
var R = /ab(?!.*?(?:a|cb|cd).*)bc/;
But R.test("abecbc")
evaluates to false
. So let us split R
into three parts:
/ab(.*)/.test("abecbc")
returns true
. Then
/(.*)bc/.test("abecbc")
returns true
.
The inner part (i.e. the part between "ab"
and "bc"
) is "ec"
. And
/(?!.*?(?:a|cb|cd).*)/.test("ec")
returns true
, which is expected. So there must be three truths, and there are no more parts in R
. Then why does
/ab(?!.*?(?:a|cb|cd).*)bc/.test("abecbc")
evaluate to false
? And how to write a correct regex that solves the problem described in the first paragraph of the post without using the positive lookahead (?=)
as the AND operator?
EDIT
My question is not a duplicate of this question: I need an explanation of why the particular regex (R
) returns false
instead of true
. Another difference is that I do not need to test whether the inner part contains a specified string.
Your attempted regex of R = /ab(?!.*?(?:a|cb|cd).*)bc/
fails to match abecbc
because a negative lookahead pattern is a zero-width assertion, so with your regex bc
has to immediately follow ab
. And if you try fixing it by adding .*
before bc
then there's no guarantee that a match of a|cb|cd
occurs between ab
and bc
.
You can instead capture B and what comes after it so that you can use the capture as an ending in a negative lookahead assertion to avoid a match when there's any of N between A and B:
ab(?=.*?(bc.*))(?!.*(?:a|cb|cd).*\1).*?bc
Demo: https://regex101.com/r/NqLbfV/4
EDIT: The solution above performs a non-greedy match, but since you later indicated in the comments that you desire a greedy match, you can instead capture what comes before A and use a negative lookbehind assertion to avoid an occurrence of any of N between A and B:
(?<=(.*))ab.*(?<!\1ab.*(?:a|cb|cd).*)bc
Demo: https://regex101.com/r/7xuUNP/2
Note that this requires that your browser supports variable-width lookbehind patterns, which is currently the case for all major modern browsers.