Search code examples
regexurl-validation

Regexp: check if repeated group contains a letter at least once


I am learning regular expressions and have a task to make an expression to validate URLs (I have a specific list of URLs that must validate and must fail). Here is what I currently have

^((https?:\/\/)(?=.*[A-Za-z]+.*)(([A-Za-z0-9]+\-*[A-Za-z0-9]+|[A-Za-z0-9])\.){1,}([A-Za-z]+)\/?$)

Among all other URLs, these URLs must validate:

http://1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
http://0test.com/

However these must fail:

http://1234567890123456789012345678901234567890123456789012345678901234.com
http://0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.com

They must fail because they have no letters in the domain name (only in the top level domain name), and I don't understand how to exclude them.

I have added a positive lookahead:

(?=.*[A-Za-z]+.*)

I was hoping that it will only check the following repeated group:

(([A-Za-z0-9]+\-*[A-Za-z0-9]+|[A-Za-z0-9])\.){1,}

but it checks the the whole expression until the end, i.e. it checks in the top level domain name too. How do I solve this?


Solution

  • You have the right idea, but, as you said, you dont want the lookahead to account for the top level domain name. So include a copy of that match in your look ahead:

    (?=.*[A-Za-z]+.*\.([A-Za-z]+)$\/?)
                     ^-------- will match the top level domain
                              ^ will ensure its the last part of the domain
    

    I also changed your A-z to A-Za-z (wasn't sure if typo but reminder that A-z matches more than just letters)

    EDIT: look behind doesnt work because it doesn't allow for variable sized matching. Added the \/? for possible / ending