Search code examples
.netregexregex-lookaroundsbalancing-groups

Possible to match at one position of regex but not another (e.g. positional XOR)?


I am looking to create several sub-expressions in a larger regular expression, where each subexpression matches something at one place in the input or another place, but not in both places, preferably using the same named group per "area of interest". For example, I'd like to match volume units in italics below, and currency units, shown in bold.

  • $3.23 USD / gal.
  • USD 3.23 in gallons
  • 4.50 CAD / gal
  • 1 gal @ USD 3.23
  • 10 gal. @ $4.50 CAD

Or more generally:

  • stuffmorestuffXXXyetmorestuff
  • stuffXXXmorestuff

where stuff and morestuff could be a complex set of sub-expressions.

It seems like it might be possible using some combination of

  • group stack push/pop
  • balancing groups
  • look-around

but I'm not sure how to proceed. Does it come down to alternations (|) or multiple passes with different expressions (which I suppose amounts to the same thing)?


Solution

  • You probably have to use alternation, something like this?

    ^(stuffmorestuff)XXX(yetmorestuff)|(stuff)XXX(morestuff)$
    

    But you will end up with four capture groups. Not sure how the .NET regex engine will behave if you use the same group name for several groups.