Search code examples
javascripthtmlregexhtml-inputecmascript-next

Why is my regex valid with the RegExp u flag, but not with the v flag and does not work in HTML pattern attribute?


I am getting the below console warning for this regex pattern:

^[a-zA-Z0-9+_.-]+@[a-zA-Z0-9]+\\.[a-zA-Z0-9]+$

Pattern attribute value ^[a-zA-Z0-9+_.-]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+$ is valid with the RegExp u flag, but not with the v flag:
Uncaught SyntaxError: Invalid regular expression: /^[a-zA-Z0-9+_.-]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+$/v: Invalid character in character class.

I cannot see how to create a valid regex pattern for this warning. Please, could somebody explain the error and how to resolve it?

I tried looking at documentation, but could not see how to make it valid for the v flag


Solution

  • The issue is that the newly introduced v flag inside HTML pattern attribute is used automatically when compiling a RegExp object

    The value provided to a pattern attribute is converted into a regular expression with the v flag for unicode sets.

    The <input> pattern attribute reference states:

    The compiled pattern regular expression of an input element, if it exists, is a JavaScript RegExp object. It is determined as follows:

    1. If the element does not have a pattern attribute specified, then return nothing. The element has no compiled pattern regular expression.

    2. Let pattern be the value of the pattern attribute of the element.

    3. Let regexpCompletion be RegExpCreate(pattern, "v").

    4. If regexpCompletion is an abrupt completion, then return nothing. The element has no compiled pattern regular expression.
      Note: User agents are encouraged to log this error in a developer console, to aid debugging.

    5. Let anchoredPattern be the string "^(?:", followed by pattern, followed by ")$".

    6. Return ! RegExpCreate(anchoredPattern, "v").

    The /v flag applies even more restrictions to escaping rules. Since it allows character class subtraction and intersection, the literal - at the end of a character class cannot be left unescaped.

    So, if you use the u flag, there is no such a restriction, with the v flag, it is in place. Cf.

    console.log(/^[a-zA-Z0-9+_.-]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+$/u.test("[email protected]"))
    console.log(/^[a-zA-Z0-9+_.\-]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+$/v.test("[email protected]"))

    So, always escape literal hyphens inside character classes in ECMAScript patterns.

    Here are more details on which patterns are now considered invalid:

    Some previously valid patterns are now errors, specifically those with a character class including either an unescaped special character ( ) [ { } / - | (note: \ and ] also require escaping inside a character class, but this is already true with the u flag) or a double punctuator:

    [(]
    [)]
    [[]
    [{]
    [}]
    [/]
    [-]
    [|]
    [&&]
    [!!]
    [##]
    [$$]
    [%%]
    [**]
    [++]
    [,,]
    [..]
    [::]
    [;;]
    [<<]
    [==]
    [>>]
    [??]
    [@@]
    [``]
    [~~]
    [^^^]
    [_^^]