This question relates to character class subtraction in regular expression (regex). I refer to the regex flavour of XPATH 2.0 second edition.
When there are negative groups within a character class subtraction, does the subtract operator (-) occur before? or after the negative group operator (^)?
The text of the XPATH/ XML schema specification is below. But to my mind, it reads ambiguously.
For any ·positive character group· or ·negative character group· G, and any ·character class expression· C, G-C is a valid ·character class subtraction·, identifying the set of all characters in C(G) that are not also in C(C).
To be more specific, consider the following three regexes:
being matched against the haystack text of:
What are the possible match texts (first and subsequent)?
I don't think that text is ambiguous, if we are lenient enough to read G-C
as [G-[C]]
, and a negative group, ^G
, as [^G]
. Now, it looks clear that the caret is part of the first group, and does not negate both groups.
Therefore, [^abc-[ad]]
would match:
{All Characters Besides
a
,b
andc
} \ {a
andd
} = { All Characters Besidesa
,b
,c
andd
}
Keep in mind, you can easily test to see the behavior :)
.
As a bonus, .Net regular expressions also support this feature, making it a little easier to test online.
See also: Character Class Subtraction