I have a .NetStandard2.1
C#
application that needs to run Regex
in the ECMAScript
flavor.
According to MSDN documentation, I can use RegexOptions.ECMAScript
:
Enables ECMAScript-compliant behavior for the expression.
I know that \A
anchor is not supported in ECMAScript
(According to link and when I tried Regex101 with the ECMAScript option). But it seems that .Net does support it. Example:
Regex emcaRegex = new Regex(@"\A\d{3}", RegexOptions.ECMAScript);
var matches = emcaRegex.Matches("901-333-");
Console.WriteLine($"number of matches: {matches.Count}"); // number of matches: 1
Console.WriteLine($"The match: {matches[0]}"); // The match: 901
I expect to get not matches at all, what am I missing?
You need to look for the answer further in the "ECMAScript Matching Behavior" article.
This option does NOT redefine the .NET-specific anchors meanings, they are still supported.
The behavior of ECMAScript and canonical regular expressions differs in three areas: character class syntax, self-referencing capturing groups, and octal versus backreference interpretation.
Character class syntax. Because canonical regular expressions support Unicode whereas ECMAScript does not, character classes in ECMAScript have a more limited syntax, and some character class language elements have a different meaning. For example, ECMAScript does not support language elements such as the Unicode category or block elements
\p
and\P
. Similarly, the\w
element, which matches a word character, is equivalent to the[a-zA-Z_0-9]
character class when using ECMAScript and[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]
when using canonical behavior. For more information, see Character Classes.Self-referencing capturing groups. A regular expression capture class with a backreference to itself must be updated with each capture iteration.
Resolution of ambiguities between octal escapes and backreferences.
Regular expression | Canonical behavior | ECMAScript behavior |
---|---|---|
\0 followed by 0 to 2 octal digits |
Interpret as an octal. For example, \044 is always interpreted as an octal value and means "$". |
Same behavior. |
\ followed by a digit from 1 to 9, followed by no additional decimal digits, |
Interpret as a backreference. For example, \9 always means backreference 9, even if a ninth capturing group does not exist. If the capturing group does not exist, the regular expression parser throws an ArgumentException. |
If a single decimal digit capturing group exists, backreference to that digit. Otherwise, interpret the value as a literal. |
\ followed by a digit from 1 to 9, followed by additional decimal digits |
Interpret the digits as a decimal value. If that capturing group exists, interpret the expression as a backreference. Otherwise, interpret the leading octal digits up to octal 377; that is, consider only the low 8 bits of the value. Interpret the remaining digits as literals. For example, in the expression \3000 , if capturing group 300 exists, interpret as backreference 300; if capturing group 300 does not exist, interpret as octal 300 followed by 0. |
Interpret as a backreference by converting as many digits as possible to a decimal value that can refer to a capture. If no digits can be converted, interpret as an octal by using the leading octal digits up to octal 377; interpret the remaining digits as literals. |