Search code examples
c#regexpowershellbacktracking

E-mail Regex causing catastrophic backtracking error


I am at the verge of loosing my mind over trying to fix an Email Regex i built:

It is almost perfect for what i need. It works in 99.9% of all cases.

But there is one case which causes a catastrophic backtracking error and i cannot fix my regex for it.

The "Email" causing a catastrophic backtrack error:

jasmin.martinez@tester.co.rolisa-brown.king@tester.co.ro

Yes, such emails do occur in the application i need this Regex for.

People enter multiple Emails in one field for some reason. I have no answer for why this occurs.

I need the Help of Wizards of Stack Overflow.

My Email Regex might block or not block some officially valid Emails but that is not the point here.

All i want is to fix the catastrophic backtracking Problem of my Regex. I do not want to change what it blocks or not. It works for what i need it to do.

Here is my Email Regex:

^[^\W_]+\w*(?:[.-]\w*)*[^\W_]+@[^\W_]+(?:[.-]?\w*[^\W_]+)*(?:\.[^\W_]{2,})$

How can i make this Regex fail quickly so it doesn't cause a catastrophic backtracking error.

Thank You very much.


Solution

  • You can use

    ^(?!_)\w+(?:[.-]\w+)*(?<!_)@[^\W_]+(?>[.-]?\w*[^\W_])*\.[^\W_]{2,}$
    

    See the regex demo.

    The main idea is introducing an atomic group, (?>[.-]?\w*[^\W_])* where backtracking is not allowed into the group pattern, and the re-vamped pattern before @: (?!_)\w+(?:[.-]\w+)*(?<!_), that matches one or more word chars (while the first char cannot be a _) followed with zero or more sequences of . or - followed with one or more word chars that cannot end in _.

    The + in ^[^\W_]+ is redundant, the next \w* already matches the same chars, so it can be removed. The same idea is behind removing + in [^\W_]@.

    Note that the last non-capturing group is redundant, I removed it.

    See the regex graphs:

    enter image description here

    and from debuggex:

    enter image description here

    An ASCII only version:

    ^(?!_)[A-Za-z0-9_]+(?:[.-][A-Za-z0-9_]+)*(?<!_)@[A-Za-z0-9]+(?>[.-]?[A-Za-z0-9_]*[A-Za-z0-9])*\.[A-Za-z0-9]{2,}$