Search code examples
regexregex-negationregex-lookaroundsregex-groupregex-greedy

Matching a sequence of non-empty, delimited strings with a Regular Expression


I'm not an expert with Regular Expressions, and I'm having serious problems matching a particular pattern.

The pattern is:

A sequence of consecutive, arbitrary words marked with a prefix and a suffix. Inside the word there should be at least one character.

I mean, suppose that the prefix is "AB" and the suffix is "YZ". With this input:

AB----YZAB====YZABYZ//AB++YZ,,,AB====YZAB---YZ

The matched groups should be:

AB----YZAB====YZ , AB++YZ , AB====YZAB---YZ

The group ABYZ should not be matched, because it is "empty" (there is nothing between the prefix and the suffix.

I tried with

(AB(.*?)YZ)+

But the ABYZ is detected as part of the sequence, as the "*" may match nothing. If I force to use non-empty groups with

(AB(.+?)YZ)+

But still no lock, it detects groups

AB----YZAB_____YZABYZ//AB++YZ and AB====YZAB---YZ

I tried many other, more complex, regExps, with no luck.

Any help would be very appreciated!


Solution

  • You may use

    (?:AB(?:(?!AB).)+?YZ)+
    

    See the regex demo.

    Details

    • (?:AB(?:(?!AB).)+?YZ)+ - one or more repetitions of
      • AB - an AB substring
      • (?:(?!AB).)+? (or (?:(?!AB|YZ).)+) - any char but a line break char, 1 or more repetitions, as few as possible, that does not start an AB char sequence (a so-called tempered greedy token)
      • YZ - a YZ substring.