I'm trying to create a PHP PCRE regex that is (almost) fully compatible with RFC5321 and 5322 to test email addresses. The only thing I don't require is the (comment) part. I've seen some other attempts at this posted on here, but when I run tests vs. them they don't all work.
I have been working on one that is very close:
^(([\w \!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~\.]{1,64})|("[\w \!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~\.]{1,64}"))@(([\w\-]*\.?[\w\-]*)|(\[\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\])|(\[IPv6:[\da-fA-F]{0,4}:[\da-fA-F]{0,4}:[\da-fA-F]{0,4}:[\da-fA-F]{0,4}\]))$
To break it down:
Local part:
(
Match at most 64 of the allowed characters
([\w \!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~\.]{1,64})
|
OR match the same set of characters in a quoted string:
("[\w \!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~\.]{1,64}")
)
end local part.
match @ sign
@
match domain part:
(
match domain part using allowed characters:
([\w\-]*\.?[\w\-]*)
or ipv4 (it doesn't check to make sure they are < 255 - that would be handled elsewhere)
(\[\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\])
or ipv6
(\[IPv6:[\da-fA-F]{0,4}:[\da-fA-F]{0,4}:[\da-fA-F]{0,4}:[\da-fA-F]{0,4}\])
)
The only thing it's missing is the ability to check for multiple consecutive .'s (periods) that are outside a quoted local-part. I ran tests on regex101.com vs. all the addresses below using some of my own tests and the tests on the wikipedia article about email addresses:
bob@smith.com
bob.smith@smith.com
bob-smith@smith.com
bob-smith@bob-smith.com
b0b!-...smith@smith.com <-DOES NOT VALIDATE CORRECTLY - MULTIPLE .'s
bob&smith@smith.com
"bob..smith"@smith.com
simple@example.com
very.common@example.com
disposable.style.email.with+symbol@example.com
other.email-with-hyphen@example.com
fully-qualified-domain@example.com
user.name+tag+sorting@example.com
x@example.com
example-indeed@strange-example.com
admin@mailserver1
example@s.example
" "@example.org
"john..doe"@example.org
Abc.example.com
A@b@c@example.com
a"b(c)d,e:f;g<h>i[j\k]l@example.com
just"not"right@example.com
this is"not\allowed@example.com
this\ still\"not\\allowed@example.com
1234567890123456789012345678901234567890123456789012345678901234+x@example.com
john..doe@example.com <-DOES NOT VALIDATE CORRECTLY - MULTIPLE .'s
john.doe@example..com
I attempted to use lookahead and lookbehind assertions to test for the consecutive periods, but I couldn't figure it out. I think that's the only thing it's missing (other than the comments, which for my purposes aren't required).
Is there a way to check for the periods that wouldn't alter what I currently have too much, or would it require a different approach?
Please let me know if I missed anything else.
Thank you.
You may add (?!("[^"]*"|[^"])*\.{2})
after ^
.
See the regex demo.
The (?!("[^"]*"|[^"])*\.{2})
negative lookahead fails the match if, immediately to the right of the current location, there is
("[^"]*"|[^"])*
- 0 or more occurrences of a "
followed with 0+ chars other than "
and then "
or any char other than "
\.{2}
- two consecutive dots.