Search code examples
authenticationemailemail-validation

What manipulations can be done to user emails to prevent duplicates


I am woking on email based authentication that checks database for existing users based on their email and decides whether to create new account or use existing one.

Issue I came across is that users sometimes use different capitalisation in their emails, append things like +1 in the middle etc...

To combat some of these I am now (1) Stripping whitespaces away from the emails (2) always lowercasing them.

I would like to take this further, but am not sure what else I am allowed to do without breaking some emails i.e.

(3) Can I remove everything after + and before @ signs? (4) Can I remove other symbols like . from the emails?


Solution

  • Email addresses are case-insensitive (A and a are treated the same), so changing all upper case to lower case is fine. Digits (0-9) are also valid for emails.

    However, you should not remove any of the following characters from an email address:

    !#$%&'*+-/=?^_`{|}~.
    

    Control characters, white space and other specials are invalid.

    If you discover characters not in the list of 20 characters above, they would represent an invalid email. How those are handled is undefined in the standard.

    Why removing the + is an issue: It is used by some mail providers to separate (file) inbound email into folders for a user. So jack+finance@email.com would go to a finance folder in Jack's email. Other mail providers would consider it part of the email address. So jack+bauer@email.com can be a different account than jack+sparrow@email.com.

    So removing the + (along with characters after it) could conflate different email accounts into an invalid email address.