I am at the verge of loosing my mind over trying to fix an Email Regex i built:
It is almost perfect for what i need. It works in 99.9% of all cases.
But there is one case which causes a catastrophic backtracking error and i cannot fix my regex for it.
The "Email" causing a catastrophic backtrack error:
jasmin.martinez@tester.co.rolisa-brown.king@tester.co.ro
Yes, such emails do occur in the application i need this Regex for.
People enter multiple Emails in one field for some reason. I have no answer for why this occurs.
I need the Help of Wizards of Stack Overflow.
My Email Regex might block or not block some officially valid Emails but that is not the point here.
All i want is to fix the catastrophic backtracking Problem of my Regex. I do not want to change what it blocks or not. It works for what i need it to do.
Here is my Email Regex:
^[^\W_]+\w*(?:[.-]\w*)*[^\W_]+@[^\W_]+(?:[.-]?\w*[^\W_]+)*(?:\.[^\W_]{2,})$
How can i make this Regex fail quickly so it doesn't cause a catastrophic backtracking error.
Thank You very much.
You can use
^(?!_)\w+(?:[.-]\w+)*(?<!_)@[^\W_]+(?>[.-]?\w*[^\W_])*\.[^\W_]{2,}$
See the regex demo.
The main idea is introducing an atomic group, (?>[.-]?\w*[^\W_])*
where backtracking is not allowed into the group pattern, and the re-vamped pattern before @
: (?!_)\w+(?:[.-]\w+)*(?<!_)
, that matches one or more word chars (while the first char cannot be a _
) followed with zero or more sequences of .
or -
followed with one or more word chars that cannot end in _
.
The +
in ^[^\W_]+
is redundant, the next \w*
already matches the same chars, so it can be removed. The same idea is behind removing +
in [^\W_]@
.
Note that the last non-capturing group is redundant, I removed it.
See the regex graphs:
and from debuggex:
An ASCII only version:
^(?!_)[A-Za-z0-9_]+(?:[.-][A-Za-z0-9_]+)*(?<!_)@[A-Za-z0-9]+(?>[.-]?[A-Za-z0-9_]*[A-Za-z0-9])*\.[A-Za-z0-9]{2,}$