Search code examples
regexpcrebetween

Using regex to find multiple matches between two strings


Imagine I have a string like this:

c x c x A c x c x c B c x c x

And I want to find any "c" character that is between "A" and "B". So in this example I need to get 3 matches.

I know that I can use lookahead and lookbehind tokens. So I used this regex:

(?<=A).*c.*(?=B)

But it gets all the sting between A and B: c x c x c as one result.

And if I remove the .* parts, there will be no match at all.

I made an example here. so you can see the results.


Solution

  • There are two common scenarios here: 1) the A and B are different single character strings, 2) A and B are different mutlicharacter strings.

    Scenario 1

    You may use negated character classes:

    (?:\G(?!^)|A)[^AB]*?\Kc(?=[^AB]*B)
    

    See this regex demo. Details:

    • (?:\G(?!^)|A) - A or end of the previous successful match
    • [^AB]*? - any zero or more chars other than A and B, as few as possible
    • \K - match reset operator that discards all text matched so far in the overall memory match buffer
    • c - a c char/string
    • (?=[^AB]*B) - that must be followed with zero or more chars other than A and B and then B char immediately to the right of the current location.

    Scenario 2

    If A and B are placeholders for multichar strings, say, ABC and BCE and the c is some pattern like c\d+ (to match c and one or more digits after it) use

    (?s)(?:\G(?!^)|ABC)(?:(?!ABC).)*?\Kc\d+(?=.*?BCE)
    

    See this regex demo. Details:

    • (?s) - a DOTALL modifier that makes the regex engine match any char with .
    • (?:\G(?!^)|ABC) - ABC or end of the previous successful match
    • (?:(?!ABC).)*? - any char, 0 or more times, that does not start an ABC char sequence
    • \K - match reset operator
    • c\d+ - c and one or more digits
    • (?=.*?BCE) - any zero or more chars, as few as possible, followed with BCE.