I am learning regular expressions and have a task to make an expression to validate URLs (I have a specific list of URLs that must validate and must fail). Here is what I currently have
^((https?:\/\/)(?=.*[A-Za-z]+.*)(([A-Za-z0-9]+\-*[A-Za-z0-9]+|[A-Za-z0-9])\.){1,}([A-Za-z]+)\/?$)
Among all other URLs, these URLs must validate:
http://1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
http://0test.com/
However these must fail:
http://1234567890123456789012345678901234567890123456789012345678901234.com
http://0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.com
They must fail because they have no letters in the domain name (only in the top level domain name), and I don't understand how to exclude them.
I have added a positive lookahead:
(?=.*[A-Za-z]+.*)
I was hoping that it will only check the following repeated group:
(([A-Za-z0-9]+\-*[A-Za-z0-9]+|[A-Za-z0-9])\.){1,}
but it checks the the whole expression until the end, i.e. it checks in the top level domain name too. How do I solve this?
You have the right idea, but, as you said, you dont want the lookahead to account for the top level domain name. So include a copy of that match in your look ahead:
(?=.*[A-Za-z]+.*\.([A-Za-z]+)$\/?)
^-------- will match the top level domain
^ will ensure its the last part of the domain
I also changed your A-z
to A-Za-z
(wasn't sure if typo but reminder that A-z
matches more than just letters)
EDIT: look behind doesnt work because it doesn't allow for variable sized matching. Added the \/?
for possible /
ending