Search code examples
regexasp.net-mvcasp.net-mvc-5unobtrusive-validationvalidationattribute

Regex positive lookahead for "contains 10-14 digits" not working right


I've got a Regular Expression meant to validate that a phone number string is either empty, or contains 10-14 digits in any format. It works for requiring a minimum of 10 but continues to match beyond 14 digits. I've rarely used lookaheads before and am not seeing the problem. Here it is with the intended interpretation in comments:

///  ^                      - Beginning of string
/// (?=                     - Look ahead from current position
///      (?:\D*\d){10,14}       - Match 0 or more non-digits followed by a digit, 10-14 times
///      \D*$                   - Ending with 0 or more non-digits
/// .*                      - Allow any string
/// $                       - End of string
^(?=(?:\D*\d){10,14}\D*|\s*$).*$

This is being used in an asp.net MVC 5 site with the System.ComponentModel.DataAnnotations.RegularExpressionAttribute so it is in use server side with .NET Regexes and client-side in javascript with jquery validate. How can I get it to stop matching if the string contains more than 14 digits?


Solution

  • The problem with the regular expression

    ^(?=(?:\D*\d){10,14}\D*|\s*$).*$
    

    is that there is no end-of-line anchor between \D and |. Consider, for example, the string

    12345678901234567890
    

    which contains 20 digits. The lookahead will be satisfied because (?:\D*\d){10,14} will match

    12345678901234
    

    and then \D* will match zero non-digits. By contrast, the regex

    ^(?=(?:\D*\d){10,14}\D*$|\s*$).*$
    

    will fail (as it should).

    There is, however, no need for a lookahead. One can simplify the earlier expression to

    ^(?:(?:\D*\d){10,14}\D*)?$
    

    Demo

    Making the outer non-capture group optional allows the regex to match empty strings, as required.

    There may be a problem with this last regex, as illustrate at the link. Consider the string

    \nabc12\nab12c3456d789efg
    

    The first match of (?:\D*\d) will be \nabc1 (as \D matches newlines) and the second match will be 2, the third, \nab1, and so on, for a total of 11 matches, satisfying the requirement that there be 10-14 digits. This undoubtedly is not intended. The solution is change the regex to

    ^(?:(?:[^\d\n]*\d){10,14}[^\d\n]*)?$
    

    [^\d\n] matches any character other than a digit and a newline.

    Demo