Search code examples
c#asp.netregexasp.net-mvc-2email-validation

Why is this Email regex so slow on Mvc?


I am currently building a system using Asp.net, c#, Mvc2 which uses the following regex:

^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$

This is an e-mail regex that validates a 'valid' e-mail address format. My code is as follows:

if (!Regex.IsMatch(model.Email, @"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$"))
                ModelState.AddModelError("Email", "The field Email is invalid.");

The Regex works fine for validating e-mails however if a particularly long string is passed to the regex and it is invalid it causes the system to keep on 'working' without ever resolving the page. For instance, this is the data that I tried to pass:

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

The above string causes the system to essentially lock up. I would like to know why and if I can use a regex that accomplishes the same thing in maybe a simpler manner. My target is that an incorrectly formed e-mail address like for instance the following isn't passed:

[email protected]

Solution

  • You have nested repetition operators sharing the same characters, which is liable to cause catastrophic backtracking.

    For example: ([-.\w]*[0-9a-zA-Z])*

    This says: match 0 or more of -._0-9a-zA-Z followed by a single 0-9a-zA-Z, one or more times.

    i falls in both of these classes.

    Thus, when run on iiiiiiii... the regex is matching every possible permuation of (several "i"s followed by one "i") several times (which is a lot of permutations).

    In general, validating email addresses with a regular expression is hard.