I am trying to validate emails (UTF8) using the following regular expression
Regex.IsMatch(emailAddress,
@"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$", RegexOptions.CultureInvariant);
It returns false for "ä[email protected]".
Any suggestions on how to improve it.
UTF-8 has nothing to do with this, you're validating a string, not a particular encoding thereof.
Your Regex actually returns true for "ä[email protected]"
(with or without the CultureInvariant
option). Try Console.Write(Regex.IsMatch("ä[email protected]", @"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$", RegexOptions.CultureInvariant));
on its own, and you get true
.
You will fail on all IDNs like info@ουτοπία.δπθ.gr
and if you care about non ASCiI-restricted email addresses you may want to include them. (And if you want to exclude prohibited confusables, you're getting really complicated).
There are the problems stated by others with using regular expressions to validate emails, but they boil down to:
The actual email syntax is more complicated than people think (even before we deal with the non-ASCII extensions). e.g. did you know that Abc\@[email protected]
is a valid email address? It is, in fact it's an example of a valid address given in RFC 3696.
If you go to the effort of building a perfect validator (it is possible), it'll be a waste of effort. Chances are your email software won't handle them all (e.g. Abc\@[email protected]
above won't work with a lot of software) an then lots of valid email addresses won't actually be correct.
But anyway, I get true
running your code, the bug is elsewhere.